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ABSTRACT 


Currently systematic techniques for assessing macro mechanisms for transferring 
software engineering technologies are non-existent. This leads to inefficient allocation of 
research resources and increased risk to software technology intensive programs. 
Consequently, software technology transition today is an ill-defined, non-repeatable, and 
inefficient process for bringing advanced software engineering technologies to market. 

The essence of this research is defining an engineering model for an evolving 
software process. The contribution can be summarized as developing the relationships of 
information “temperature” ( °Saboe ), entropy, pressure, volume (nodes) and the conserved 
property - information in terms of messages. This ties together for the first time, 
information theory, chaos control dynamical systems, statistical mechanics and software 
engineering. 

This dissertation develops an engineering model and the relationships of various 
controlling parameters in an evolutionary process. Cast in terms of new technology 
transfer ( TechTx ) models for analysis, it is able predict and prescribe action for a research 
or program manager. Each model deals with entropy as defined in information theory. 
Each model deals with entropy as defined in information theory. The TechTx Basic 
Entropy model developed addresses macro level trends of a technology at the community 
level. The TechTx Entropy Feedback model is based on non-linear control theory. 

The controlling parameter of the evolutionary process is suggested to be 
information temperature (° Saboe ), which is developed four different ways. First by 
comparing the slopes of the controlled property (information in terms of messages). 
Second, using a one-dimensional set of non-linear dynamical system of equations, then 
with a two-dimensional system of equations. Third, by using the partition function. With 
the partition function, the conserved property is allocated to sets of sets in a power set. A 
probability distribution is developed for discrete message levels, called “q-levels”. Each 
discrete “q-level”, which indicate whether there are single terms in a set (q-level=l), a set 
of sets consisting of pairs of terms is considered q-level=2. q-level=3 consists of a set so 
sets comprised of three terms, etc. contains a count of the micro-states of primitive 
messages in that partition. A relationship to the Weibiull distribution function is shown. 
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Four Views of the Controlling Parameter - Information Temperature 


The q-level primitive message micro states, empirical data, is related to the 
partition function, which is found to have a temperature term as the controlling 
parameter. This result is due to the normalizing condition being primitive message per 
unit volume. A unit volume in the control space is a performing node. Empirical data for 
Ada and Java show that the information temperature is similar to that of the ideal gas law. 
Temperature is proportional to pressure, which can be found to be messages per node. 

It is suggested that “the fundamental” units of temperature are in information 

units. 

A most interesting development is the relationship that appears to exist between 
the two dimensional system of non-linear dynamical equations representing deterministic 
chaos and the general fonn of the bakers transformation. The bakers transformation is a 
general form of a Bernoulli shift, and has been suggested to represent deterministic chaos 
in evolving processes (Prigogine 1983, 1997). Unlike Prigogine’s work, this research 
suggests for the first time that a system of equations which includes both an abstract 
representation of a conserved property (information in terms of primitive messages) and 


































entropy (in information units of bits) have a relationship to a controlling intensive 
variable - temperature in information units. 

The research includes a comprehensive review of the state-of-the-art in software 
technology transfer. This summary focuses on the elements of technology transfer 
required to model the technology transfer process. Specifically, this research develops 
the fundamentals for a rigorous software technology transfer model as required by the 
TechTx Entropy Feedback model. The relationship of entropy (.S'//) as defined for 
information by Shannon, and the eigenvalue, or the norm of a dynamical system, is 
explored. The Lyapunov number is a natural measure developed from the eigenvalue of 
a dynamical system, e.g. related to entropy. The significance of the eigenvalue for a 
communications software technology transfer model is discussed. The result of this 
research is the definition of an engineering model for an evolving software process. 

The mechanisms are developed utilizing information theory, communication 
theory, chaos control theory, statistical mechanics, and learning curve principles. The 
combination of those scientifically sound mechanisms provides a basis for assessing, 
and/or prescribing a portfolio of technologies and the implementing macro infrastructure. 
This provides the theoretical framework for a practical method for a program manager 
to establish a high capacity transition channel, which can accelerate technology 
maturation and insertion. The significance of the eigenvalue of the dynamical system is 
discussed and related to the Lyapunov exponent and number to indicate stability. The 
relationship to pressure on the community, and a temperature of the technology process 
is developed. An engineering model results using a state equation similar to that used by 
engineers to define a process cycle. The result is useful to program managers, policy 
makers and practitioners in analyzing and prescribing a process for the evolution of a 
technology. It is speculated that the state relationships of the Technology Dynamics 
model can be used to model any evolutionary process and software itself is a special case 
of the model. Finally, it is suggested that this is the engineering and mathematical basis 
for software physics. Data samples assess the following technologies: software 
engineering, software technology transfer, Ada, Java, abstract data types, rate monotonic 
analysis, cost models, software standards, and software work breakdown structures. Also 
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included is an extensive annotated bibliography on software technology transfer and 
related references, and a bibliography including related material from philosophy, 
psychology, math, physics, thermodynamics, management, economics, game theory, 
technology transfer, software engineering, and systems engineering. 

Let’s set a context. 

Induction is a process of inferring a general law or principle from the observations 
of particular instances. This is inductive inference. Inductive reasoning is a more general 
concept than inductive inference. It is a process of assigning a probability (or credibility) 
to a law or proposition from observation of particular instances. Inductive inference 
draws conclusions on rejecting or accepting a proposition, possibly with out total 
justification. Inductive reasoning only changes the degree of our belief in proposition. 
Deductive reasoning of inference derives the absolute truth or false hood of a proposition. 
This is a case of inductive reasoning. This approach to explaining things around us dates 
back at least to Epicurus (342?-270?BC) (Li 1993, p. 274). Let’s consider theory 
formulation in science as the process of obtaining a compact description of past 
observations together with future ones. 

Let us suggest that the preliminary data of an investigator, the hypothesis 
proposed, the experimental design and setups, the trials perfonned, the outcomes 
obtained, the new hypothesis formulated, etc, can be encoded as an initial segment of an 
infinite binary sequence. The investigator obtains increasingly longer initial segments of 
an infinite binary sequence by performing more and more experiments. To describe the 
underlying regularity in the sequence, the investigator tries to formulate a theory that 
governs the sequence on the basis of the outcome of past experiments. Candidate 
theories or hypothesis are identified from the sequences starting with the observation of 
the initial segment. 

There are many different possible infinite sequences or histories that the 
investigator can embark on. The phenomenon the investigator is trying to understand or 
the strategy used can be stochastic. In this type of view, a phenomenon can be identified 
with a measure, i.e. probability distribution, on a continuous sample space. 
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This research attempts to express the task of learning a certain concept as in terms 
of sequences over a basic alphabet. We express what we know as a finite sequence over 
the alphabet, an experiment to acquire more knowledge is encoded as a sequence over the 
alphabet, the outcome is encoded over the alphabet, new experiments are encoded over 
the alphabet and so on. This way we can view a concept as a probability distribution 
(measure) over a sample space of all one way infinite binary sequences. Each sequence 
corresponds to one never ending sequential history of conjectures, refutations, and 
confirmations. The distribution can be said to be the concept of phenomenon involved. 
We can predict what is likely to turn up next with an initial segment. Using Bayes rule, 
for conditional probability, we can predict and extrapolate future outcomes. This is the 
general thrust of this research. 

Hope you find this interesting. There is a lot more here than meets the eye. 
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Laurie, and the rest of you who have prayed for me, 
thank you all for opening my mind, my heart and my soul. 
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*Hafiz, whose given name was Shams-ud-din-Muhammad, is the most beloved poet of 
Persia. Born in Shiraz, he lived at about the time of Chaucer in England and about a 
hundred years after Rumi. He spent nearly all his life in Shiraz, where he became a 
famous Sufi master. When he died, he was thought to have written an estimated 5000 
poems of which 500 to 700 have survived. His Divan (collected poems) is a classic in the 
literature of Sufism. The work of Hafiz became known in the West largely through the 
efforts of Goethe, whose enthusiasm rubbed off on Ralph Waldo Emerson, who translated 
Hafiz in the nineteenth century. Hafiz's poems were cdso admired by such diverse writers 
as: Nieztsche, Pushkin, Turgenev, and Garcia Lorca; even Sherlock Holmes quotes Hafiz 
in one of the stories by Arthur Conan Doyle. In 1923, Hazrat Inayat Khan, the Indian 
teacher often credited with bringing Sufism to the West, proclaimed that "the words of 

Hafiz have won every heart that listens." 

Hafiz's poetry is rooted in the beautiful human need for companionship and in the soul's 
innate desire to surrender all experience -- except Light. The verses speak on many 
levels simultaneously, though they are crafted with such brilliance, rarely does one feel 

left out. 

People from many religious traditions share the belief that there are always living 
persons who are one with God. These rare souls disseminate light upon the earth and 
entrust the Divine to others. Hafiz is regarded as one who came to live in that Sacred 
union, and sometimes his poems speak directly to that experience. 

If God wanted, He could give Himself entirely to someone without diminishing His own 
state. And if you were the recipient of that Divine Gift -- what would you then know? 

Rumi, Kabir, Saadi, Shams, Fransis of Assisi, Ramakrishna, Nanak, Milarepa, and Lao- 
tzu are among the many known to have achieved perfection or Union because their 
extraordinary romance with the Beloved. They are sometimes called the "realized souls" 
or "Perfect Masters." My Father, Michael S. Sciboe Sr., was one of those. He, as have 
my other mentors and advisors, have greatly enabled the work 1 have done. 

As Hafiz wrote: 

The voice of the river that has emptied into the Ocean 
Now laughs and sings just like God. 
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I. INTRODUCTION 


A. GOALS AND PROPOSED NEW CONTRIBUTION 
1. The Problem and Goals 

Software Technology transition today has an ill-defined and non-repeatable, 
inefficient process for bringing advanced software engineering technologies to market. 


The goal of this research is to develop the basic 
elements for an industrial model of a software technology 
transition engine that establishes a high capacity transition 
channel, which accelerates technology maturation and 
insertion. 

The top level requirement of the model is to minimize the amount of effort 
required to realize an idea into reality. A set of concepts is introduced that are cycle, 
application and technology independent. This research presents a general set of models, 
with underlying independent and dependent variable relationships for software 
technology transition. The model is an engineering model in the full sense. The 
underlying model is as robust as any thermodynamic or physics model. It represents a 
closed form of interrelated equations that are brought to the software engineering 
discipline for the first time. These models provide a method to analyze and later 
prescribe the size of a research transition infrastructure and the probability of a 
technology maturing at a given time. Further, the engineering and mathematical 
relationships appear to be applicable to any evolutionary process e.g. software 
development) and potentially to software itself. 

This research dissertation develops the elements of three new technology transfer 
models that can be represented mathematically. This provides a method for analysis for 
both predictive and prescriptive activities. All of the existing work in software 
technology transfer appears to lack mathematical models. The three technology transfer 
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models addressed are: 1) TechTx Basic Entropy, 2) TechTx Entropy Feedback, and a 3) 
TechTx Entropy Learning Curve is suggested. 

The basic model analyzes the entropy 1 of terms relating to technology messages 
published 2 over time. This model is compared to a baseline model, a message vs time, 
used in the diffusion of information research literature. The second model is at the 
organizational node or sub-node level and gives the basis for analyzing macroscopic and 
local interactions in a process. The third model suggests the incorporation of learning 
curves at the organizational node level. This model is refined to incorporate both entropy 
and learning. Each of these models represents a refinement of the predecessor model. 
For example, 3) TechTx Entropy Learning Curve, builds from 2) TechTx Entropy 
Feedback, which is an extension of both 1) TechTx Basic Entropy. The mathematical 
implications of the third model are suggested. While all three models represent an 
extension to the state-of-the-art, the last model, TechTx Entropy Feedback, provides the 
basis for an entire set of engineering tools to permit analysis of a evolving processes. 
This model is validated and the results of over 100,000 data points yield a confidence 
interval of less than ±0.3%. 

The key underlying communication diffusion research of Rogers (Rogers 1983, 
1995) is pervasive in the more specific study of software technology transfer (see Buxton 
1991, Raghavan 1988, 1989, Fichman 1993, 1994, Jaakkola 1995, Fowler 1994, Pfleeger 
1999, and many more). The research in this dissertation suggests preliminary analysis of 
the basic elemental tools required for a software technology transition cycle analysis 
approach. 

This work is motivated (see B. Motivation and Significance of the Problem, 
p4) by the need for an acquirer, or research program manager, to assess risk related to the 
maturity date of a technology. Data and charts that summarize relevant aspects of this 

1 Entropy (in greek it comes from en, in + trope, turning) comes from the conviction that the future 
will not repeat the past, that time moves unidirectionally, and the world is moving on (Nash 1974). By 
always increasing in the direction of spontaneous change, entropy indicates the ‘'‘turn,” or direction, taken 
by all such change. 

2 Unless otherwise noted: the words “publish”, and “sent” are used interchangeably; the words 
“message”, “publication”, and “record” are synonymous. Publishing a message is the same as performing a 
task to develop a message. 
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work are presented. A sample data set for “software engineering” is plotted. 
“Technology Transition Models” (see A. technology Transfer Model Features, p31) 
then summarizes the specific software technology transfer literature. Most of this 
literature addresses the implementation details required to address software technology 
transition. 

With the principle relationships of the models developed, the research suggests 
methods to construct and analyze a design for a technology transfer engine. The design 
can provide prescriptive insight to a program manager or research manager, as to how to 
best configure a research program to achieve stability, confidence, and earliest 
convergence. 
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B. MOTIVATION AND SIGNIFICANCE OF THE PROBLEM 

At the International Conference on Software Engineering 2001, the keynote 
speech (Shaw 2001) illustrated the trends in maturation of software technology. The 
model cited was one from 1984 (Redwine 1984). That model, while the result of an 
interesting set of case studies at that time, provides no prediction capability. It only 
identifies a set of state transition points, to tag an historical analysis in other case studies. 
Two of the states identified in that model are not identifiable in any consistent manner, 
and have been questioned in the literature (Pfleeger 1999), (Saboe 2001). 

Current military applications typically push high perfonnance technology without 
large consideration given to cost. On the other hand, commercial enterprise applications 
are very much interested in producing a product with reduced cost, increased 
responsiveness to market pressures, and reduced cycle time to product delivery. 

The current model in use in the United States features the National Science 
Foundation (NSF) and Department of Defense (DoD) as major contributors to the 
advancement of software technology (e.g. NSF, Defense Advanced Research Projects 
Agency and Service Laboratories). There has not been a focused national 
implementation effort in the high technology area of software engineering, although it 
has become a national agenda item (Boehm, Basili 2000). 

The approach to date has been criticized for decades in numerous government 
reports and in the literature (DSB 2000). The current approach, to advance software¬ 
engineering technology, is a by-product of some advanced technology development effort 
that focuses a narrow light on the requirements of the target system. The large ticket 
NSF, DARPA, and Service lab efforts in software engineering tend to move in parallel to 
advanced system developments. Historically, these efforts are always looking for a home 
and an insertion point. Yet, the product developers desire mature technologies that work 
well in the field, not whiz-bang lab tools that work fine only in the fabricated 
demonstrations. This poses a problem for efficient, consistent insertion. It also 
highlights a waste of national intellectual capital. 
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In the commercial model, let’s simply look at the challenges of Microsoft and 
Netscape. The competitive challenges as well as the challenges of immature technologies 
that are rapidly emerging as standards (Cusumano 1995 and 1998) brightly illustrate the 
obvious. Industry needs a better model for inserting technology as well. Another 
development is the general movement to standards-based software (Jovanovic 1999). 
Not only are we moving towards open standards and infrastructure in software 
applications, but also in vehicles with embedded software, and in the software 
engineering organization. It turns out that the weak area in the Redwine model is exactly 
in the area related to diffusion, or popularization to the broader population. This 
popularization phase is the point where the standardization phase of a technology occurs. 
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c. 


DESIRABLE FEATURES AND DEFICIENCIES 


It seems reasonable to define some desirable characteristics of a good software 
engineering technology transition model. The model should enable the software research 
and program manager community to quantify the maturation of a technology (or portfolio 
of technologies), and the uncertainty in the arrival time of the technology. With the 
appropriate analytical model, we should be able to manipulate the model to enable 
adjustments and prescriptions. Primary reason to analyze, adjust and prescribe is to 
reason about ways to reduce relative risk and uncertainty, and accelerate the arrival of a 
technology for use in a program. 

After a careful review of the literature, it seems apparent that a good model for 
technology maturation and transition is lacking for software engineering. There are no 
references in the software technology transition literature, which indicate that a model for 
analyzing, predicting, and prescribing maturation, stability, and confidence in the 
evolution of a technology exists. There is a clear need based on the researcher’s 
extensive personal experience (nearly 30 years at every level of industry and the 
Department of Defense). Discussions with the software technology transition program at 
the Software Engineering Institute (SEI), consistently indicated that there is a critical lack 
of and need for an analytical model of the type proposed. The elements of such a 
proposed analytical model promise to permit analysis of various alternatives for policy 
and investment trades. Tools that build on this analysis approach can help identify 
leverage points and opportunities to accelerate progress in a repeatable and rigorous 
process enabling quantification of maturity at a given date and confidence in a subject 
technologies stability. 

With such tools, a decision-maker can determine the confidence with which a 
technology or group of technologies will stabilize and converge in a given time frame. 
For example (see Figure I-1), a risk assessment use for a program might expect a 
portfolio of technologies to arrive by year 06 with an 80% certainty, but the model might 
show that in 06, there is only 60% certainty of being available using the current trends. 
The desired 80% certainty would not be available until 08. For the desired system, or 

macroscopic, curve, we can algebraically solve for the node response curve(s). The 
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model can then be used for prescriptive purposes. This enables trades to determine how 
many and whether parallel or serial tasks are required. If the technology is not predicted 
to arrive as required, the model will point to the areas for remedy with a prescriptive 
solution to organize, train and equip an organization in order to change the confidence of 
arrival of the technology for the program’s required schedule. 


Program Office Use for Risk 
Assessment and Rx 


80% 


60% 



Example: 

Program Office Wants 
by 06 with 80% certainty 

Analysis indicates 08 

What nodes / programmatics 
need to be put into place to 
shift curve to left? 

From desired system curve 
08 Algebraically solve for node response curves(s) 

Determine how many and parallel / serial 


Figure 1-1 Program Office Use of Objective Model 
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D. RESEARCH APPROACH 

1. Rational and Experiential Analysis 

The study of abstract thought has persisted and evolved along with the emergence 
of experimentalism. A well-known marker in the intellectual history of this study is Rene 
Descartes. Historically, it has been thought that Descartes established the proper method 
of inquiry with his statement Cogito, ergo sum, "I think, therefore I am" 3 . Roger Bacon 
(Haskins 1927) later, Newton 4 , Lock, Barkley 5 and others brought us to the sensible 
experiential flavors. We review the development of this merging of philosophy, math, 
physics, and metaphysics with the practical experimental methods we use as engineers. 

As engineers, we assimilate, combine and produce. Good technology is contrived 
to fulfill a human need. That is why it satisfies more than function. This research 
assumes, as a basic premise, that software technology transfer is not significantly 
different from the development of knowledge in other disciplines. The subject matter, or 
domain, is different, but the constructs used by humans to formulate physical or 
experimental knowledge are similar. The game, then, is to meld the logical-mathematical 
philosophical musings represented in a model with infonnation gathered to validate the 
model in order to reduce uncertainty, and to communicate the results. Thus, we have the 
intention of diffusing the information to the society or subsets of the society (receptors) 
that use the infonnation gathered in the development or extension of a technology, and to 


3 It is often misunderstood that this statement represented the "proof' of his existence, vice the method 
of rational analysis and an examination approach devoid of the defects of perceptions. Even with rigorous 
experimentation our "perceptual" and "sensual" observations of associations of properties are often fooled. 
This discussion turns up repeatedly through the history of science. Even the defeat of pure skepticism 
occurs due to uncertainty. It is a curious aside to note that it was not until the early twentieth century that 
the scientific method evolved to the point of rejecting the null hypothesis. 

4 There are two linkages here, the 1 st law and state. Newton, when formulating his laws was 
improving on Descartes’ Principia. Newton learned about the law of inertia from Descartes. In fact, it is 
the first law in both the Principia of Descartes, and the Principia of Newton, and both deal with 
“continuous” acting forces. From Descartes’ presentation of the law, Newton learned the important 
concept of motion as a “state” (status) (Newton 1726, p46). He developed the 2 nd law, which sets forth a 
proportionality between a “force” and a “change of motion.” In this law, it means an impulse, a discrete 
force. The 1 st law was formulated (as a hypothesis) to allow for the condition that there were certain 
[continuos] insensible forces that are otherwise not known to use (Newton 1726 pi 10). We could 
speculate that there could be a counterpart today for discrete “forces” not otherwise know to us - say an 
information force. 

5 Barkley gives us the saying that goes like this, if a tree falls in the woods and no one hears it does it 
make a noise? A message is communicated only if there is a receiver to receive it. 
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the subsets (consumers) that would use a technology. The research in this dissertation, in 
a limited sense, is studying that process itself. 

2. Context and Overview 

Let’s set a context. Induction 6 is a process of inferring a general law or principle 
from the observations of particular instances. This is inductive inference. Inductive 
reasoning is a more general concept than inductive inference. It is a process of assigning 
a probability (or credibility) to a law or proposition from observation of particular 
instances. Inductive inference draws conclusions on rejecting or accepting a proposition, 
possibly without total justification. Inductive reasoning only changes the degree of our 
belief in proposition. Deductive reasoning of inference derives the absolute truth or false 
hood of a proposition. This is a case of inductive reasoning. 

This approach to explaining things around us dates back at least to Epicurus 
(342?-270?BC) (Li 1993, p. 274). Let’s consider theory formulation in science as the 
process of obtaining a compact description of past observations together with future ones. 
Let us suggest that the preliminary data of an investigator, the hypothesis proposed, the 
experimental design and setups, the trials performed, the outcomes obtained, the new 
hypothesis formulated, etc., can be encoded as an initial segment of an infinite binary 
sequence. The investigator obtains increasingly longer initial segments of an infinite 
binary sequence by performing more and more experiments. To describe the underlying 
regularity in the sequence, the investigator tries to formulate a theory that governs the 
sequence based on the outcome of past experiments. Candidate theories or hypotheses 
are identified from the sequences starting with the observation of the initial segment. 

There are many different possible infinite sequences or histories on which the 
investigator can embark. The phenomenon the investigator is trying to understand or the 
strategy used can be stochastic. In this type of view, a phenomenon can be identified 
with a measure, i.e. probability distribution, on a continuous sample space. 


6 The Oxford English Dictionary defines induction this way. 
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This research attempts to express the task of learning a certain concept in terms of 
sequences over a basic alphabet. We express what we know as a finite sequence over the 
alphabet. An experiment to acquire more knowledge is encoded as a sequence over the 
alphabet, the outcome is encoded over the alphabet, new experiments are encoded over 
the alphabet, and so on. This way we can view a concept as a probability distribution 
(measure) over a sample space of all one way infinite binary sequences. Each sequence 
corresponds to one never ending sequential history of conjectures, refutations, and 
confirmations. The distribution can be said to be the concept of phenomenon involved. 
We can predict what is likely to turn up next with an initial segment. Using Bayesian 
analysis (Bayes 1763) to compute the conditional probability, we can predict and 
extrapolate future outcomes. This is the general thrust of this research. 

Let’s develop an analogy of the flow of communication to a physical model to 
illustrate the concept. When two people meet, they converse, and consequently modify 
their thinking to some extent. These modifications are brought to subsequent meetings 
and modified further. The word for this is dissemination or diffusion. There is a flow of 
communication in society, just as there is a flow of correlations in matter. Let’s explore 
this idea of correlations using the analogy of a physical system and look at what happens 
in terms of distribution functions. 

Consider a glass of water. We may visualize the interactions as leading to 
collisions between the molecules. We can describe the water containing them in terms of 
a statistical ensemble. The water is not aging if we were to consider the individual 
molecules over geologic time 7 . Yet, there is a natural time order in the system from a 
statistical point of view. Aging is a property of populations, as in the biological theory of 
evolution as developed by Darwin. It is a statistical distribution that approaches the 
equilibrium distribution. 

7 Newton’s scholium differentiates time this way. “Time, space, place, and motion ... quantities are 
popularly conceived solely with reference to the objects of sense perception. ... 1. Absolute, true, 

mathematical time, in and of itself and of its own nature, with out reference to anything external flows 
uniformly and by another name it is called duration. Relative, apparent, and common time is any sensible 
and external measure (precise or imprecise) of duration by means of motion; such a measure - for example, 
a month, a year - is commonly used instead of true time.” (Newton 1726 p408). This annotated translation 
keeps to Newton’s original language. Many translations have been modernized. These other 
modernizations do not lend itself to the rich abstract nature, and subtulies that are important to this 
research. 
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Consider a probability distribution p(xj, X 2 ) dependent on the two variables x h vy. 
If xi and X 2 are independent, we can factor p(xi, X 2 )= pi(xi) P 2 (x 2 ). The probability p(xi, 
X 2 ) is the product of the two probabilities. On the other hand, if p(xj, X 2 ) cannot be 
factored, xj and vy are correlated (Bayes 1763 p299) Return to the glass of water 
molecules. The collisions between the molecules have two effects: they make the 
velocity distribution more symmetrical, and they produce correlations (see Figure 1-2). 
However, two correlated particles will eventually collide with a third one (see Figure 
1-3). Binary correlations are then transformed into tertiary ones etc. Prigogine illustrated 
this molecular model, and it has been verified (Prigogine 1997 p79). 


Collisions and Correlations 
n n 



After Collision 


Before Collision 



The collision of two particles creates a correlation between them 
(represented by the wavy line) 

Source: After Prigogine 1997 
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Figure 1-2 Collisions and Correlations (Source: After Prigogine 1997) 
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Flow of Correlations 



Before Collision 
Successive collisions lead to binary, t 
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Source: After Prigogine 1997 
72 


Figure 1-3 Flow of Correlations (Source: After Prigogine 1997) 


We could conceive of inverse processes that make the velocity distribution less 
symmetrical by destroying correlations. Processes that invert the velocity of particles for 
a physical world as in Figure 1-4 have been reproduced. However, this inverted flow of 
correlations can only be achieved for a short time, with limited numbers of particles. 
Then we again have a directed flow of correlations involving an ever-increasing number 
of particles leading the system to equilibrium. 

We now have a flow of correlations that are ordered in time just as there is a flow 
of communication in society. There is a method to describe this irreversibility. This 
statistical description is dynamics of correlations leading to the equilibrium solution. 

In this research, we use messages instead of particles. This turns out to be a 
conserved quantity (conserved quantities shared between two systems need not be 
restricted to energy 8 , or mass, or volume, the conserved quantity could be a number of 

8 Energy is an interesting term. It is a primitive term. It is a mathematical abstraction that has no 
existence apart from its functional relationship to variables or coordinates that do have a physical 
interpretation and that can be measured (Abbott 1989 pi). The 1 st law of thermodynamics is merely a 
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measures, even money) (Yakavenko 2000) (Fanner 1999). We are concerned with a 
detenninistic dynamical system as well as an especially simple type of dynamical system, 
both corresponding to dynamical system maps. Contrary to what occurs in ordinary 
dynamics, time in maps acts only at discrete intervals. Maps represent a simplified fonn 
of dynamics that make it easy to compare the individual level of descriptions 
(trajectories) with the statistical description (see Appendix A Infonnation, Control 
Theory and Evolutionary Dynamical Systems Basics, p273). (Prigogine 1997 p81). 

It is not the place of this research to provide a mathematical formalism with 
theorems and lemmas. Rather this research provides a heuristic solution. We do, 
however, want to recognize that the careful construction of the model aligns with very 
deep mathematical constructs. It is important to realize that the problem of correlations 
of infonnation distributions and dynamical systems can not be solved at the level of 
trajectories or individual particles. It can, however, be solved at the level of ensembles 9 . 
In the TechTx Entropy Learning Curve Model, the sample space is allocated to course 
grained partitions. In this way, we can connect the dynamical and statistical views in a 
manner that is consistent with the newest chapters in math and physics. We are able to 
predict the speed at which the distribution approaches equilibrium and to establish the 
relationship of this speed with the Lyapunov exponent 10 . This is developed in Chapter 
III. 


formal statement asserting that energy is conserved. This represents a primitive statement about a primitive 
concept. Moreover, both are linked. The 1 st law depends on the concept of energy, and it is equally true 
that energy is the essential thermodynamic function precisely because it allows the formulation of the 1 st 
law. 

9 Boltzman was the first to show the relationship of trajectories in state space and ensembles. It is his 
work that is considered the first practical use of statistical mechanics. 

10 The Lyapunov exponent shows a divergence or convergence in dynamical systems. This identifies 
the signature of a deterministic dynamical system. 
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Destruction of Correlations 



Ot 


a) Particles (black points) interact 
with obstacle (circle) Initially all of 
the particles have the same velocity. 
The collision varies the velocities 
and creates correlations between 
the particles and the obstacle 


b) represents the opposite 
process. Consider the effect of 
velocity inversion as the result of 
the inverted collisiM, 
Correlations with the obstacle 
are destroyed, and the initial 
conditions are recovered. 
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Figure 1-4 Destruction of Correlations 


3. Validation 

The research validation follows the strategy shown in Figure 1-5. The proposed 
TechTx Basic Entropy model asks the question, “X is a method of predicting technology 
maturity, Can we do better?” in assessing the maturity of a technology, using the Y, new 
model. Validation compares it to the existing methods. Constructing the TechTx Entropy 
Feedback model is a more difficult challenge. Development was difficult due to the lack 
of previous work in the software community in relating statistical mechanics, non-linear 
dynamical systems control theory and information theory. Validation proved straight 
forward, since the model lent itself to readily collecting samples to validate the equations 
with thousands, to over one hundred thousand data points. Here the research is asking, 
“Can it be done at all?” The TechTx Entropy Feedback model was developed and 
exceeded expectations. The model is exercised with data from the TechTx Basic Entropy 
model. The TechTx Entropy Learning Curve model is suggested from the results of the 
other models. The technology transfer maturation process is characterized by learning 

curves. The validation is of the form, that Shaw used, “Look, it works!!” (Shaw 2001) 
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“Look, 
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Experience 


Figure 1-5 Validation Strategy (Source: After Shaw 2001) 


The proposed model was compared with the traditional diffusion of innovations 
communication model to predict trends and the maturation of a technology. The 
traditional model is the baseline model and uses the message-counting method. The first 
proposed model is the TechTx Basic Entropy model. This is the first improvement over 
the traditional model and uses the content of the message, measured in the information 
dimension of entropy. The entropy is derived from the basic message counting model so 
the excellent predictions seen by the linear message counting model is retained. 
Historically, entropy is represented in information units - bits. Essential elements related 
to entropy are addressed in Chapter II. These include, 1. Probability, 2. Information, 
Uncertainty, 6.Stochastic Model and Markov Chains, and related concepts (see C. 

Statistical Elements of the Technology Transition Models, p69). Chapter III, (see 
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1. Entropy Review, p98) includes a brief review of entropy as used in infonnation 
theory. 

The TechTx Entropy Feedback model was compared with experimental data to 
validate the state equation relationships, infonnation theory and dynamical systems 
equations. 

The last model is the TechTx Entropy Learning Curx’e model. It appears that the 
feedback model exhibits characteristics of learning curves. With the addition of the 
learning curve to the feedback model, this model suggests a method which detennines the 
learning rates for organizations and researchers (on average) in perfonnance bands of +/- 
la, +2 a, +3 a, and greater than 3 a. It is an extension of the TechTx Basic Entropy 
model. The TechTx Entropy Learning Curve model, is not validated explicitly, however, 
the feedback model is tuned to equate entropy measured at the macro (system) level with 
entropy measured at the micro (organizational node) level. The result appears to be a 
learning curve. Using a transfer function, this tuning creates the relationship between the 
macro world entropy of the TechTx Basic Entropy model, and the micro world entropy of 
an organization. 
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E. OVERVIEW OF THE DISSERTATION STRUCTURE 


This research has progressed in a pattern typical of the history of the development 
of science throughout the ages. We first set an initial context and historical relations in 
Chapters I and II. The assessment of previous work in Chapter II introduces existing 
models used in technology transfer, then concentrates on the issue of software technology 
transfer. At the end of Chapter V, we speculate that the model is general enough to be 
applied to any technology, and should not be limited to the domain of software. Since the 
proposed model relies heavily on the concepts related to the learning curve, statistical 
mechanics and entropy, a review of these concepts is also developed. 

A table summarizing the various work and features is mapped to the proposed 
model contributions. Deficiencies in the current approach to software technology transfer 
are identified in each section of historical model review. In short, there has not been a 
systematic, mathematical approach focused on the technology transfer infrastructure. 
Most work has addressed implementation details. This effort focuses on the 
mathematical and logical models of the overall technology transition channel. 

We begin with the model development in Chapter III introducing information 
entropy, and learning curves. The steps include: 

1) Development of the macro/micro relationships of infonnation entropy (which 
are related to statistical mechanics) for software technology transfer. 

2) Development of statistical mechanics and dynamical systems relationship to 
yield technology transfer dynamics models. 

The relationship of complexity of an input entropy and number of tasks required 
to reduce the time per unit task is developed in a stepwise fashion. The approach 
developed in Chapter III expands the basic linear model with a general form of non-linear 
components in a dynamical system model. The dynamical system models, first in one 
dimension (entropy), and then two dimensions (entropy and number of tasks performed) 
in a time step are developed. Here we are addressing two orthogonal views of 
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complexity. On one axis, we find information content addressed by infonnation theory, 
where generally the optimization is around minimizing program length and packing of 
sequences and patterns. On the other axis, complexity addressed by dynamical systems. 
Optimization along this axis is generally around minimizing time needed to perform the 
process. Also this can represent a state space of intensive and extensive variables. This 
will be discussed in Chapter II, with the review of the Statistical Elements of Technology 
Transfer, and in Chapter III. Combining these views permits development of a 
perfonnance index roughly in terms of tasks per unit time to enable trades between 
program length, and performance. The performance index coupled with the other views 
in state space provides the mechanism to detennine the configuration of a technology 
transition channel or engine to mature a technology. 

A macro view of the system is developed. The macro view is related to the 
constituent micro (organizational node) level view. Discussion on the tuning of the 
model parameters of a learning curve at the node level to approximate the true system is 
developed. A three dimensional extension to the basic models which includes feedback 
is proposed. 

Validation and Assessment of the data in Step 3 is based on data collected on a 
sample of 50,744 raw records for the eight technologies. For example, a technology with 
approximately 4250 raw records, comprising an alphabet of 1583 primitive message 
terms, capable of generating 676,'417 11 messages sets - the data points which fonn the 
basis of the models. The data was taken in monthly intervals over a 21 year period. The 
nodes over the same time-period consisted of 22,394 author sets. This gives a very tight 
confidence interval, which is discussed in Chapter IV. The technologies were chosen 
because they were assumed to have well studied histories. These technologies include 
Ada, Java, Abstract Data Types, Rate Monotonic Analysis, Software Cost Models, 
Software Work Breakdown Structures 12 , Software Technology Transfer, and Software 

11 The confidence interval can be approximated by 1/Vn. This is 1//Vl 17,637 = +/- 0.292% for 
messages and l/v/22,394 =+/- 0.67% for author node sets. Generally, this can be considered a very tight 
confidence interval. 

12 The author performed significant research in Software Work Breakdown Structures for the 
Department of Defense in the 1990’s. Therefore, it was a technology with a well-known history. 
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Engineering. The first technology Ada was well studied and like the internet was initially 
sponsored by a government organization. Java, has a well known history and like Ada 
there was significant early sponsorship (Sun) but many more users were exposed to this 
technology over a shorter period of time due to the emerging nature of the world wide 
web, and standards driven by industry. The next three technologies (Abstract Data 
Types, Rate Monotonic Analysis, Software Cost Models) were studied elsewhere and 
offered a set of data for comparison with another model. The remaining technologies 
were chosen because the subjects were well known to the author, and in the case of 
software engineering, of general interest to the community. The discussion and 
validation of the model using these technologies is perfonned in Chapter IV. A heuristic 
development approach is used to validate the conclusions. The degree of fonnality (low) 
was detennined by considering the current maturity of software engineering and its 
related science, computer science, relative to other disciplines at this stage. 

Data is collected on a variety of technologies that have been previously studied. 
The data is easily collectable and available to decision-makers at the macroscopic, 
observable, perfonnance parameter level. At this point, the theory development and 
validation is done. With these models in place, future research can explore cycle analysis 
and implementation details can be refined. 

The appendices provide an overview of relevant advanced mathematical details, 
general discussion of historical note related to Chapter III, and data used in Chapter IV. 
The appendix also includes a description of the entropy model codes and data reduction 
tools developed for this research. The tools used are Microsoft Excel and Access 
applications. Add-ins, in the fonn of macros, contains the analytical models. Interface 
code is written in Visual Basic. While research tools, they are suitable for perfonning 
analysis of the type proposed. A significant contribution is the software technology 
transition annotated bibliography in the appendix. This bibliography provides a data set 
for future analysis of the feedback model. 

Chapter V summarizes the contributions of this research and provides conclusions 
pointing to the scope of future work. It suggests that analysts are able to develop, from 
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this point of departure, a point design for an “Industrial Model of a Software Technology 
Transition Engine”. 

Implications and future research are identified in Chapter VI. In addition, in 
Chapter VI, it is suggested that a software technology transition engine could be analyzed 
with the tools developed. We conjecture that such an engine, one that pumps 
technologies to the user community, should be efficient, i.e. the maximum amount of 
work product should get to the goal of insertion with the minimum amount of resources 
consumed and wasted. An analytical approach is suggested that uses a cycle diagram, 
familiar to physicists, mechanical engineers and thennodynamicists. The technology 
transfer TechTx dynamics cycle diagram and analytical approach could be used to 
evaluate the efficiency of the technology transfer engine. This approach is similar to a 
Carnot cycle analysis using state 13 points of entropy, temperature, and pressure. Chapter 
VI suggests areas for additional work: the notion of “squaring the Camot cycle”; the 
Second Law Analysis, a description of the TechTx engine in terms of evolutionary 
software development process; and identification of software development entropy 
metric. Further, since this research has linked its foundation to physics and 
thermodynamics, we now have the full richness of those disciplines potentially available. 
This will pennit building and extending software engineering with existing theory in 
these disciplines in the language familiar to the scientist and engineer. 

Findings: This research identified a minimum collection of variables that can 
represent a framework for an industrial strength model for software technology transition. 
Manipulation of these variables enables analysis of the cause and effect relationship of 
elements constituting a transition channel. The research suggests a set of relationships 
that can be manipulated in much the same way that science and engineering disciplines 
evaluate designs using physics and thermodynamics. The model presentation is suitable 
to communicate to policy makers. In fact, initial relationships developed in this research 
suggests that there is a ",software physics" that can at least be applied to software 
technology transfer and by extension, to evolutionary software development, and with 
further research, to the software itself. It may in fact apply to the evaluation of any 
13 State comes from the term status. 
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evolutionary technology system’s assessment beyond the discipline of software. This is 
especially aligned to assist with, biologically inspired computing to compute with 
patterns, not bits. It appears that this logic development is not obvious if one approaches 
from the software and traditional deterministic "programming" direction. 
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F. DEFINITION OF TERMS 

1. T£%VT| (Techne), Science, and Invention 

The title of this research pivots around the terms technology, transition and 
engine. All of the other terms are simply qualifiers that narrow the domain (software), 
target the user, and robustness (industrial - implying, albeit loosely, the notion of usage in 
a non-trivial solution and operational space), and model (implying this product is a 
representation or approximation). The terms high capacity, accelerated, and high 
efficiency represent desired perfonnance characteristics of the model. There is a desired 
causal relationship between the low-level elements, from which the model is constructed, 
and changes in the outcome of these performance parameters. 

We develop the terms of reference for this work starting with some definitions. 
Transition is the change based on some set of actions that moves the object, in this case 
technology. While we cannot draw this thing called technology, nor can we draw it, nor 
sense it, we can associate it with a collection or cluster of thoughts. If we accept that it 
could be the latter, then it is closely coupled to methods of recognizing and organizing 
some of its attributes as represented in these thoughts. In this dissertation, we develop a 
method to measure the patterns of those associations to enable quantification for 
mathematical manipulation. This leads us to include a key feature, which is a human 
aspect. 

Technologies reflect our human needs. They are mirrors of ourselves. The word 
technology helps us understand this "process". The Greek word T£%VT| (or techne ) 
describes art and skill in making things. T£%vr| is the work of a sculptor, a stonemason, a 
composer, or an engineer. The suffix -ology means the study or lore of something. 
Technology is the knowledge of making things. Let's put this in a context relative to 
science and engineering. 

The word science comes from the word scientia, which means "knowledge". We 
apply the word science to ordered and systematic knowledge. A scientist identifies what 
is known about things and puts that knowledge in some kind of order (Lienhard 2000). 
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The ordering and systematic collection of information, represented in messages 
consisting of tenns is quantified with a measure in this dissertation. 

In its role as the science of making things, technology stands apart from the actual 
act of glassblowing or machining. It is the ordered knowledge of these things. It is also 
our means for sharing our knowledge of technique. 

Engineering comes from the Latin word ingenium. That means "mental power". 
English is full of words related to ingenium: ingenuity, which means "inventiveness" and 
engine", which can refer to any machine of our devising — any engine of our ingenuity. 
For about three hundred years, science and T£XVT| have joined forces primarily through 
engineers. Today's engineers are technologists who are well-schooled in science and can 
make effective use of it when they try to create the engines of our ingenuity. 

The three functions of T£XVT|, science and invention, work together to make a 
product. People earn the title engineer when the goal of their labors is the actual creative 
design process — when they combine the knowledge of T£XVT| with science to achieve 
invention. 

A machine normally receives its permanent name only after it has achieved a 
certain level of maturity — after it has settled into popular use in the community. Babbage 
gave a particularly intriguing name to his first programmable computer in the early 
eighteenth century. He called it an analytical engine. Software packages for checking 
programs were called parsing engines long before another engine word attached itself to 
computers: the now common term, search engine. We also think of an engine in terms of 
inputs, some process or transformation and producing some output. This is true of a gas 
turbine engine, Babbage's analytical engine or a Turing machine. Under stable 
conditions, an input signal is translated by algorithm into a determinate output. This is 
how we use the tenn engine in this dissertation. We take an input, transfonn it into an 
output using the mental power of the mind, or group of minds in a organization. A 
physical engine in can be characterized thermodynamically in a mathematical model. 
This research will develop the properties of the software technology transfer engine 
model. 


23 



2. Epistemology and Software's Paradox 

First, we explore the approaches of science and engineering. As an exercise, 
establish a mental continuum. On one extreme is philosophy, at the other physics. 
Philosophy at its extreme is pure logical-mathematical knowledge detached from all 
experience. It contributes the organizational structures for the experimental, experiential, 
and epistemological search for truth. With the pure philosophical approach, experiential 
perception assumes frames of reference. At the other end of the continuum is physics at 
its extreme is a most developed science of experience. It is a perpetual assimilation of 
experimental fact with logical-mathematical structures. In this approach, we state with 
sensible experiences and the very refinement of the experience serves as logical- 
mathematical instruments used as necessary between the subject and the object to be 
reached. (Piaget 1977, p. 72). For philosophical musings in software engineering, we 
fall closer to the pure philosophy extreme, but to be practical and useful, we must be able 
to reach to sensible, physical reality that can be measured. 

The problem posed by software engineering is closely related to Planck's paradox. 
Planck suggested that physical knowledge appears to be based on sensation, and it 
withdraws increasingly. The reason is that neither philosophy nor software ever proceed 
from sensation, or even pure perception, but at the very outset, it implies a logical- 
mathematical schematization of perceptions as well as actions exercised on objects. 
Beginning by such schematization 14 , it is natural that these logical-mathematical 
additions become increasingly important with the development of physical knowledge. 
Consequently, physical knowledge is constantly withdrawn more and more from 
perceptions as such. 

This is interesting. Software or information cannot be perceived by direct 
(primary, as defined by Locke) properties, but rather by indirect properties and effects. 
Let’s look at some sensible properties. For example, software has no “mass” or directly 
sensible weight. This means a basic measure that we might use from Newtonian physics 

14 Schema is a rule, or category that we use to organize, understand and formulate what we think. 
(Martin 1991) 
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is unavailable to us. Software does not appear to have temperature, as a human would 
sense it. We can’t feel hot or cold software or information with our senses. We can not 
stick a thermometer in and directly measure a temperature. It would appear to not have 
temperature. Hence, the physical knowledge for software is at the extreme of Planck's 
paradox at the very outset. The observer-scientist developing experiential data is always 
removed from direct observable property measurement. 

This research will suggest that a direct property related to a “volume” can be 
measured. It suggests that infonnation entropy, and other properties can be calculated. 
This research will explore property relationships that can be developed using 
mathematical equations of state. 

Software Technology transition, software development and possibly software 
itself, can be conceptualized as a flow process. Flow processes have gradients of 
temperature, velocity, and even concentration gradients. A flow system assumes that the 
intensive properties at a point are the same as if the properties through out the system 
were uniform and existed at equilibrium at the same temperature, pressure, and 
composition. The implication is that the equation of state applies locally and 
instantaneously at any point in the flow system. One may employ a local state concept. 
In the domain of information, this concept can almost be used as in physics and 
thermodynamics. The notion of local, however, needs to be extended. In this study local 
is not defined in physical coordinates, because the medium, a social communication 
system or network, can communicate influence, or as we said earlier, establish correlation 
with more geographically remote nodes. 

This concept of local state is a universally accepted concept that is independent of 
the concepts of equilibrium and reversibility. At the very worst, it represents an 
acceptable approximation. 

The models, herein, for software technology transfer, (or in future research, 
evolutionary development or software), look heuristically at the logical-mathematical 
schematization of properties (extensive and intensive) for software engineering research 
equations of state. In this dissertation, we develop an abstract model, a logical- 
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mathematical schematization, with relationships about infonnation (measured in 
entropy), a property which cannot be directly measured. Mathematics is performed on 
the properties. Then we validate the model by taking quantities, which we can measure, 
e.g., numbers of nodes, the count of terms, and production rates. We transfonn those 
measurements into volume, entropy and rate of change of state (the 1 st derivative which is 
like a velocity) publication rate distributions. Then we compare the predicted abstract 
measures with the observed values transfonned to the indirect measure of entropy and 
frequency. 

Piaget develops such propositions, as he explored and traced the psychological 
origin of notions back through history to their pre-scicntilic stages. The fundamental 
notions of physical space, speed, and causality, are in fact borrowed from a common 
meaning very much prior to their scientific organization. He studied a kind of mental 
embryology in his development of a theory of knowledge. Piaget eloquently develops a 
line of reasoning that shows that all the sciences have a common thread. That is, in the 
process of developing the science or knowledge, there is a fundamental learning curx’e. 
The learning curve takes on the role of varying the efficiency of a physical system. The 
learning curve acts as transfer function from state to state of the system. 

There are many studies about the proper fonnulation for learning curves for 
different problem sets. The majority of the learning curve models indicate that the time 
to perform a task decreases with the number of times a task has been performed. This is 
covered extensively in the literature. A review of the relevant historical studies is shown 
in Chapter II (in 10. Learning Curves, p90). Chapter III develops the learning curve 
formulations used in this research (Appendix G Learning Curve, p443). 

3. Learning Vignette (Meno and Socrates) 

Let's start this thread with the discussion of rational analysis. There are many 
points where one can start the development of the relationship between rational analysis 
and experiential accumulation of understanding in the reduction of uncertainty. That is 
truth (epistemology) and the search for truth (science). The ancient philosophers, 
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Pythagoras, Protagoras, Socrates, and Plato start the first discourse (the message) that has 
continued throughout history. Socrates’ dialogue with Meno (Plato c428-c348 BC.pl63), 
(Polanyi 1969) addresses an essential question in the search for truth. This is discovery 
of a distinct type of knowledge: the knowledge of facts of daily life (experiential); and 
truth, that which has always been and will always be true. With Meno and the Socratic 
method, we observe immersion, decisions, and a learning process. Socrates did not teach 
Meno the previously unknown (to Meno) Pythagorean principles for the area of a figure. 
Rather, Socrates guided Meno via rational thought and decisions through a discovery 
process. A process implies some type of activity. A process causes a change from one 
state to another state. Questions were asked and Meno made decisions based on 
information input a series of symbols, scratching, and utterances. There was a change in 
the state of Meno’s knowledge as he absorbed and combined symbols. There was 
progress as symbols were put into order and associations were understood. We shall see 
in Chapter III (B. Infonnation Theory - Shannon’s Entropy, p96), that information 
entropy is related to the number of decisions that must be made. While the scratching of 
a geometric figure on the sand was real for the moment, and sensible, it was not the true 
form of a right triangle, but merely a representation or a model of a + b = c . 
Examining the dialogue, we see a learning process that included experiential action 
(observing the figure, and counting). We also witnessed the progressive accumulation 
of understanding as Socrates and Meno interacted (or communicated; Socrates only 
provided guidance), as Meno did the unpacking of the technology "message" from 
Pythagoras. This process is characterized by accumulation learning, modeled by learning 
curves in Chapters II and III (10. Learning Curves, p90, and p443). Part of the effort 
in reducing the uncertainty (Wehrl 1978, and others) in the message went into unpacking 
— or deciphering, and use of a protocol. There is a length of a process (program), which 
is required to unpack a message (Kolmogorov 1956, Wehrl 1978, Li 1993, Chap 2,3). In 
this case, the encryption and protocol were the formalisms of mathematics and logic. 

The key points this research will develop are all in this ancient vignette — 
reduction of uncertainty through discovery, learning, and persistence of a message. 
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The following chapters of this dissertation will review the literature (Chapter II), 
develop a model related to evolution of technology (Chapter III), and validate the model 
based on software engineering technology data (Chapter IV). 

4. Communication, Continuity 

Communication is a process in which participants create and share infonnation 
with one another in order to reach a mutual understanding. This definition implies that 
communication is a process of convergence (or divergence) as two or more individuals 
exchange information in order to move toward each other (or apart) in the meanings they 
ascribe to entities (objects, acts, events, etc). (Rogers 1983) Rogers and Kinkaid 
represent this communication in the general case as a two way process of convergence 
rather than a one way linear, act in which one individual seeks to transfer a message to 
another. (Rogers Kinkaid 1981). 

This simple concept of human (or machine) communication seems to accurately 
describe certain communication acts or events involved in technology diffusion. 

5. Diffusion 

Diffusion is the process by which an innovation is communicated through certain 
channels over time among the members of a social system. It is a special type of 
communication, in that the messages are concerned with new ideas. (Rogers 1983) For 
example, when a change agent seeks to persuade a client to adopt an innovation. 
Examining what occurs in the time step prior to an event and after an event, it is clear the 
event is only a part of a process of exchange between individuals (or machines). Rogers 
asserts that it is the newness of the message content of the communication that gives 
diffusion a special character. The newness implies that some degree of uncertainty is 
involved. 
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6. Uncertainty and Confidence 

Let's set the context. How do we make choices in the face of uncertainty? We 
know that a reasonable person having some historical experience with a true coin A, 
would assign a degree of belief (subjective probability) of about .5 probability for heads. 
Based on the history with the coin, we would be rather confident in that belief. Now 
imagine a coin B, and we know absolutely nothing about this coin. We don’t know 
whether it has two heads or two tails or if it is a fair coin. Yet, if we had to pick, we 
would be compelled to assign a single .5 probability, since we lack any information to 
indicate a greater or lesser belief in heads vs. tails. But, our confidence in .5 for coin B 
would surely be less. 

On the one hand, it is not the psychological sensation of confidence that we are 
interested in. Rather, as an engineer or decision maker, the consequences of the decisions 
are the driving issue. When we have the option of acquiring infonnation through an 
informational action, we are likely to invest energy (money, effort) before making a 
decision that results in a tenninal action. We would be willing to invest this additional 
effort in acquiring infonnation about coin B vs. A. So we see that one’s informational 
actions, though not one’s terminal actions, do depend on one’s confidence in beliefs. 

This notion of confidence plays an important role in this discourse's assessment of 
a software technology. 

We are influenced by a number of subjective factors that are always at work. 
These subjective factors mirror ourselves and often are the emotions of the heart. Beauty 
and efficiency in art and music, for example, drive human needs as well as functional, 
quantifiable attributes to reduce the expenditure of labor and effort to achieve a goal. We 
would be remiss if we did not at least acknowledge the effect these subjective needs have 
on shaping our technology. The effect of the shaping of technology by these subjective 
factors which serve the more elemental needs are not that evident by direct observation. 
There is psychology at work in our methods of acceptance, understanding and ability to 
assimilate. Often, the reason technology is impossible to predict is that our predictions 
are inevitably shaped by those factors that are fairly evident. This, therefore, requires 
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that we address the process of assimilation of knowledge. Using statistics and probability 
theory, we will stop short of turning this into a study in psychology. 

7. Chance, Aggregation through Mixing 

Today we tend to regard knowledge as a process more than a state. This stems 
partly from the epistemologies of the philosophies of science: The probabilism of the 
French mathematician Cournot, and his comparative studies of various types of notions 
set the stage for such an understanding. Critical reviews of historical works, which reveal 
the oppositions among the various types of scientific thought, clearly promote such a 
development. Even after the victory of Newton, physics believed for hundreds of years 
in the absolute character of its principles. So, the arguments developed in this research 
very much depend on the state and maturity of the knowledge process for software 
engineering. 

Another probabilistic feature of software technology transition is chance. Chance 
is a curious notion which is defined by Cournot as an interference of independent causal 
series and which generally can be designated under the term "mixture". (Piaget 1977, p. 
19) This is an important concept to expose. Mixture is irreversible and grows with an 
increasingly weaker probability of return to the initial state. This starts to address the 
aggregation typical of composition of terms and integrating domains and technologies. 
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II. ASSESSMENT OF PREVIOUS WORK 


A. TECHNOLOGY TRANSFER MODEL FEATURES 

Technology transfer (TechTx) or transition is referred to as diffusion in the 
literature. This section reviews the basics of technology transition models. Various 
theories and principles felt to be underlying human behavior and learning are presented. 
The technology transition model basics identified in the literature are then summarized. 
Seven sections identify research facets or features relevant to technology transfer. These 
models are shown in Table 1. Table 1 shows the model, a key feature of the model, and 
indication that the model proposed in this research addresses that feature. Each of these 
models in Table 1 are summarized in the following sections. 


Model In Tech Tx Literature 

Model Feature 

Proposed 

Inform ation/Control 
Theory Model 

n n n ».ik M tin n 

Theory of Human Needs 

Model 

Com plexity factor 
f ra m e w o rk 
facts, perceptions, 

Learning Curve 

Actions on messages 
(tasks) 

Structure Changes Model- 

Internal and External 

R elationship 

Shannon Entropy of 

M essages 

Joint entropy 
Information In, 

Technology Model 

Goodness of 
Technology Alone 
causes Diffusion 


Institution Building Model 

External Influences 
a ff e ctthe human 
behaviorto assimilate a 
technology 

Identifies Entropy as a 
factor that can 
influence the 
acceptance of a 
technology 

Equilibrium vs Conflict Model 

Equilibrium is an 

Instrum ent for Balance 
Conflict Is a Instrument 
to apply Pressure 


Communication Model 

A Technology is 

Delivered to Adopters 
Through a Channel, If 
Understood It is acted 
upon. 


Problem solving Model 

Present Hypothesis 

Test Hypothesis with 

Data and Logic 

Hypothesizes a 

M athem atical M odel 
and Explains based on 


Table II-l Technology Transfer Models, Features, and Relation to Proposed 

Model 
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1. The Theory of Human Needs (Leagans 1979) 

The theory of human needs (Leagans 1979, p. 15) has a number of components. 
These are as follows: the facts, the perception of the facts, human attitudes or value 
judgements about the facts, and human actions related to the facts as they perceive them. 
Leagans establishes a framework addressing the complexity factors that affect behavior 
with respect to technology transfer. The model elements suggested in our current 
research had to be general enough to permit lower level detailed elaboration that could 
address these details. This requirement for generality is driven by the need to refine the 
models to address implementation aspects of technology transfer. The proposed model 
addresses this through the mechanism of the learning curve and decomposition into 
organization and sub-organization nodes. 

2. Structure Changes - Internal - External Relationship (Piaget) 

While Piaget’s 1 work was not focused on technology transfer, his work is 
fundamental to learning schemes and to an accommodation of these schemes to the 
environmental situation (Piaget 1963, p. 103). He develops the relationship between the 
genotype (internal) and phenotype (external) information influences. Yet, neither internal 
nor external factors can individually explain human development of skills. We can think 
of this learning in terms of the acquisition of technology. During human knowledge and 
skill development, it seems to tend toward the establishment of equilibrium of the internal 
and external factors. (Piaget 1967, p. 113) The proposed TechTx Entropy Learning 
Curve model explored in this dissertation addresses this in several ways. First, the 
Shannon entropy approach, which takes a vocabulary as input and a vocabulary as output, 
and from the joint entropy (Bayesian) relationships, yields a grammar. In both the 
TechTx Entropy Learning Curve and TechTx Entropy Feedback models, the vocabulary- 
grammar relationship between internal and external factors is incorporated using 
Shannon’s statistical approach to entropy. The TechTx Entropy Feedback model 

1 Piaget, Jean (1876-1980) was a Swiss pschologist. 
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addresses mixing. It also accommodates structural changes (more explicitly addressing 
the external factor) due to feedback from external nodes. 

3. Technology Model 

The technology model (Leagans 1979) deals with potential. This model suggests 
that the attractiveness of a new technology alone is sufficiently strong to induce wide 
diffusion, acceptance and adoption by users. It tends to assume that users would use the 
new technology and attendant parts of the technology successfully without the 
persuasions of an organized education system. This model has proven highly inadequate 
when trying to introduce technology to large masses of users, rather than the elite self- 
motivated few (Leagans 1979, p. 17). This inadequacy is also consistent with the small 
percentage of innovators and early adopters identified by Rogers (Rogers 1983 p. 247). 
However, it does imply that a pressure or a vacuum may have some influence e.g. the 
growth of the Internet creates a requirement and hence a vacuum, and intelligent agents 
move in to fill the void. This is analogous to the saying, “necessity is the mother of 
invention.” The current research detailed in this dissertation does not directly address 
potential or a vacuum. However, the model currently being explored seems to set the 
stage for future research to be able to see the effects of a vacuum. 

4. Institution-Building Model 

The laws of maximum and minimum are often referred to as the institutional 
factors that explain the forces influencing plant growth. This has been applied to human 
behavior with the following rationale (Leagans 1979, p. 13): human behavior is the 
dependent variable. The assumption is that man can influence the economic, biological, 
and other forms of change to the extent that he controls the forces (nutrients) that 
influence change and the status quo. In this context, he argues that people see one or 
more inhibitors (limiting factors) and one or more incentives to innovation 
simultaneously in any situation. These variables contain and exert varying force or 
valence on the dependent variable - human behavior - and that when the deficiencies 
(inhibitors) are weakened or removed, the balance or equilibrium of opposing forces will 
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be altered. Changes in human behavior are expected to be proportionate to the amount of 
cumulative influence or valence exerted by the change incentives present. These changes 
are the net sum of the counteracting influences or change inhibitors operating in the 
situation. 

The model in our study in research uses information theory to quantify the 
probability via mutual information and , joint and conditional entropy as a method to 
address the valence of these forces. Further, the current study builds on the notion of 
need for feedback being proportional to the cumulative influence of the change incentives 
(infonnation) present. The control model used herein is non-linear. This addresses the 
comment by Leagans (Leagans 1979, p. 14) that “the input-output function is not always 
linear.” He states that the probability derives from variation in the nature of the 
influencing factors which vary by situation. For the research herein, we address this by 
means of an ensemble of very probabilistic primitive communication interactions using 
both infonnation and control theory. 


5. Equilibrium vs. Conflict Model 

In the equilibrium vs. conflict model, equilibrium is regarded as an instrument for 
achieving balance, while conflict is an instrument for applying pressure. Some 
combination of these divergent approaches does in fact operate in most models as a force 
for motivating people to adopt new patterns of behavior. This is consistent with Piaget 
and the tendency toward the establishment of an equilibrium of these factors. In 
developing the mathematical model of this study, it was interesting to discover that the 
communication control model used can settle down into equilibrium (oscillating), repelor 
or attractor stable states. Oscillation is seen under some conditions of the feedback 
model. When there is a vacuum, or pressure is applied to a node, learning is more rapid, 
up to a point. Ultimately each statistical band of nodes reaches capacity. This can be 
seen in the proposed models. 

Prigogine (Prigogine 1980, 1984), who won the Nobel Prize in 1977, says that 
living (read this as evolving) systems are rarely static, and if they are, they are likely to 
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atrophy and die from stagnation. Living organisms do not thrive in a state of balanced 
equilibrium, but usually in fluctuating restlessness. Consumers, organizations, and the 
technology evolution system itself seem to act as a living organism. The model 
developed herein addresses these concerns. 

6. Communication Model 

The communication model is considered the classical model for diffusion of 
technology. It is well developed and documented by Rogers (Rogers 1983, 1995). This 
consists of making a new technology discovery, delivering it to potential adopters 
through various communication channels, and then being understood and acted upon by 
the consumer. The communications model is generally seen as a macro model. 

Almost every well-researched technology transfer model addresses the 
communication model. Leagans (Leagans 1979, p. 19) cites Rogers (Rogers 1975), who 
identified several shortcomings of the model. These include the need to address greater 
process orientation, greater attention to causality, and recognition that the adoption 
requires a physical or overt act. This dissertation addresses these shortcomings in the 
formulation of the mathematical model in section 6. The process aspect is in the message 
and feedback loops in the control model. Causality and overt act are built into the 
transforming function f(xk) in a time step in Chapter III. 

7. Problem Solving Model 

This model presents a hypothesis of an explanation of a troubled situation. It tests 
the hypothesis with data and logic developed putting those specific results into a model. 
The hypothesis for solving the problem is fonnulated. Implementing programs and 
evaluations to assess the consequences tests the proposed solutions. This implementation 
evaluation/ includes the means and the ends. Boehm and Basili (Boehm 1999, 2000) 
essentially are espousing that the Department of Defense institute a national effort with 
Centers for Empirically Based Software Engineering (CeBase) to address transition, 
using essentially this model. 
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The current study develops a model at a macro, or strategic, level to 
predict and plan the technology infrastructure portfolio of a National Technology 
Transition effort. The current model efforts and elements are reflected in the Department 
of Defense Software Engineering Science and Technology Summit findings (Boehm 
2001 ). 


8. Classic Diffusion Tech Tx Models (Rogers 1983,1995) 

The Diffusion of Innovation (Rogers 1983, 1995) is one of the most valuable 
readings on technology transition in general. The approaches of virtually all aspects of 
technology diffusion are covered. Rogers discusses a communication model that depicts 
the classic business school "S" curve (Rogers 1983, p. 47). This is a cumulative plot of 
publications covering a given topic over time. Further, he categorizes the four main 
elements of diffusion of innovations as follows: 

• The Innovation 

• Communication Channels 

• Time 

• A Social System 

He lays out clear definitions that are commonly accepted in the literature of 
technology transition and diffusion. Rogers' lexicon can also be seen in the software 
engineering technology transfer literature, (see Moore 1991, Redwine 1984, Fowler 
1994, Fichman 1993, Zelkowitz 1995, and Pfleeger 1999). 

Looking at Rogers’ work, you can see all of the elements of a communication 
system. He classifies and distributes the types of adopters (see Figure II-1) as innovators, 
early adopters, early majority, late majority and laggards. He stresses the uncertainty- 
reduction aspect of technology. He, as do many, use the terms “innovation” and 
“technology” as synonyms. 
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Figure II-1. Distribution of Adopters. 

(Source: Rogers 1983, p. 11). 

Rogers identifies technology as a design for actions that reduce the uncertainty in 
the cause and effect relationship involved in achieving a desired outcome. (Rogers 1983, 
p. 12). The technology developed in the case of this study is itself the technology transfer 
model. The TechTx Entropy Learning Curve and Feedback models, use a transfer 
function to represent the reduction in uncertainty and the cause and effect relationship. 
The proposed model in this research provides a method to analyze options for 
instrumental actions in order to reduce uncertainty in the arrival of a given set of software 
technologies. 

a. The Innovation 

In the literature, technology generally is seen as having two components, 
hardware and software. Rogers is speaking of hardware and software in the most general 
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sense, not limited to computers. 1) Hardware consists of the tool that embodies the 
technology as material or physical objects. 2) Software consists of the infonnation base 
of the tool. 

Technological innovation creates one type of uncertainty in the minds of 
potential adopters (about its expected consequences), as well as representing an 
opportunity for reduced uncertainty in another sense (that of the information base of the 
technology itself). The latter is the potential uncertainty reduction representing the 
possible efficacy of the innovation in solving an adopter’s need or perceived problem. 

Once infonnation-seeking activities have reduced the uncertainty about 
the innovation's consequences to a tolerable level, a decision to use the innovation will be 
made. Figure II-2 shows that the probability of use of various technologies vs. time. As 
the probabilities of use increases, the risk decreases at a given time. We can see this by 
analyzing the probability distributions of the consumption of infonnation when observing 
the output of an organizational unit. We can compare the stochastic dominance of two 
alternatives accounting for the utility (a function of return and risk) of the alternative. 

The models in this research address the innovation-decision process, 
which is essentially an infonnation seeking, infonnation sending, and infonnation 
processing process. While this is not directly visible in the TechTx Basic Entropy model, 
the effects of the learning curve are found in the TechTx Entropy Learning Curve model. 
The TechTx Entropy Feedback model, working at the organizational and sub- 
organizational node level, factors in the request for clarification and feedback in order to 
reduce the uncertainty about the advantages and disadvantages of the innovation. 
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Figure II-2 Diffusion. (Source: Rogers 1983, p. 11) 


b. Communication 

The primary model in Rogers 1983 is a communication model. While 
Rogers lays out the communication channel element as component critical to diffusion, 
he performs and references an enormous amount of empirical data without addressing the 
model in terms of a communications system. Applying communication and information 
theory methods to this observation is indeed an area that could benefit the study of 
software technology transfer. The benefit of an infonnation theory and communication 
model approach has not been addressed to date. The model developed in this dissertation 
suggests a quantitative method to address the communication model using Shannon’s 
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entropy, the eigenvalue of the bakers’ transfonnation (an entropy) of the control model 
and learning curves. 

c. Time 

Time is an important element of the diffusion process. Rogers (Rogers 
1983 p36) identifies time involved with the: 1) innovation-decision process, 2) 
innovativeness and 3) rate of adoption of the innovation process. 

The innovation-decision process is the mental process that an individual or 
decision-making unit passes from first knowledge, to forming an attitude about the 
innovation, to a decision to adopt or reject, to implementation, and finally confirmation or 
validation of a decision. In Figure II-2, the horizontal distance at a given y value of 
risk/certainty between the upper band and the lower band can be seen as representing this 
time difference from knowledge to confirmation. Convergence tells us something about 
the maturity of a technology. 

Innovativeness is the degree to which an individual or other unit of 
adoption is relatively earlier in adopting new ideas. These individual or unit is 
categorized into one of the five adopter categories. 

The rate of adoption is the relative speed with which an innovation is 
adopted. A steeper curve in Figure II-2 indicates a higher rate of adoption. 

Time does not exist independently of events. It is an aspect of every 
activity. We think in terms of astronomical time, or time differences similar to asking a 
person on the street for the time and they look at their watch. Rogers and all of the 
technology transition literature address this type of time. This is time as described in 
classical physics. We in western scientific tradition take this for granted since the 
writings of the philosopher Aristotle, in which time is closely related to motion and 
therefore to space. This is a classical interpretation of time in which the present separates 
the past from the future. 

In the basic work Process and Reality, Whitehead emphasizes that the 
simple location in space-time cannot be sufficient and that the embedding of matter in 
stream of influence is essential (Prigogine 1983). Whitehead emphasizes that no entities, 
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no states can be defined without activity. No passive matter can lead to a creative 
universe. 

It is only recently that time can be expressed in a precise mathematical 
form. Since we are faced with Planck’s Paradox, with the absence of a physical reality, 
this study moves toward the mathematical notion of time as taken with the use of the 
bakers’ transfonnation in time steps and presented by Prigogine (Prigogine 1983, 1989, 
1997). 

The bakers’ transformation is essentially the folding and stretching that 
results in mixing. A summary of the bakers transfonnation is well described by 
Prigogine (Prigogine 1989 p200-205). To better understand the function, let’s examine 
two examples nonnally given to describe the process. Imagine Rome, when we observe 
the city, we see architecture and buildings from many time periods. They are all 
interspersed and mixed into the city. These areas and remnants, which are interspersed, 
are the result of mixing at a number of iterations. The other example, and the one where 
the bakers’ transformation gets its name, is folding and stretching of dough horizontally 
and vertically. Take a piece of dough, and place a spot of sauce on the dough. Fold the 
dough. Stretch the dough to be the original area again. Then successively repeat the 
iteration action. We can let X be the function that represents the value corresponding to 
the application of n bakers’ transformations. 

X n+1 = F(X n ) (2.1) 

The various functions X n are functions of internal time. The internal time 
is an operator like the one used in quantum mechanics. The age of partition X n is the 
number n of iterations i that are to be performed to go from X a to X n . Whenever the 
internal time exists, it is an operator, and not a number. The dynamics described by the 
bakers transfonnation is conservative, invertable, time reversible, recurrent and chaotic. 
These properties are the same properties that characterize real-world dynamical systems 
showing complex behavior Prigogine 1989 p203). Further discussion can be found 
Appendix A Information, Control Theory and Evolutionary Dynamical Systems Basics, 
in Prigogine (Prigogine 1983, 1989, 1997), Fanner, York Ott, (Farmer 1983), McCauley, 
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(McCauley 1993), and Baker (Baker 1990). This is the form of the finite difference 
equations used in the models. 

d. Social Structure 

The social structure provides the network and media to transmit the 
messages in the communication-diffusion model. Rogers (Rogers 1983 p. 25) quoted 
Katz, “It is unthinkable to study diffusion without some knowledge of the social structure 
in which potential adopters are located as it is to study blood circulation without 
knowledge of the structure of the veins and arteries.” The social system is a set of 
interrelated units that are engaged in joint problem solving to accomplish a common goal 
(Rogers 1983 p.24). In other words, the model is a kind of graph. 

There is more to it than interrelated units when establishing the network of 
individuals and organizations. Hargadon (Hargadon 1997) provides an interesting insight 
via an ethnography on these network mechanisms, for technology brokering and 
innovation in a development firm that produces one of a kind products. He identifies the 
mixing mechanisms and the feedback process, building on historical data and experience. 
The experience is held in informal networks and is communicated in terms that are 
aggregations and abstractions of tenns that were used in prior internal efforts. Typical of 
the communication were short hand descriptions that would sound like, “We can build 
this with a X like a Y from the Z project.” In this dialog, Y is an abstract chu nk of a 
previous project. 

Allen (Allen 1977, 1983) and others emphasize the importance of the 
“messages” from outside organizations. He indicated that as many as 80 percent of the 
messages come from sources outside the organization. This is interesting since the model 
proposed draws on external sources of infonnation providing “messages”. This is also 
one of the points of departure from a thermodynamic system consisting of particles. In a 
thennodynamic system with physical particles, the important feature of stochastic 
dynamics is the local, short-range character of the interactions. In the physical system, 
the number of transactions going on per unit time in a system of size N must be 
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proportional to the size. That is each element can only sense its neighbors. In a social 
system, especially in the technology transfer communications of today, due to the 
Internet, mass media, telecommunications, fully text and indexed databases, this local 
character has to be redefined. Local is not geographically local, but rather defined as 
accessible by a direct contact. Each element can simultaneously sense all of the other 
elements present. This is addressed in the input to the models developed in Chapter 3. 

Another aspect influencing network size in a social system is “who you 
know” and how efficient, and the endowment of the social network. There is a method to 
detennine effective - efficient network size and diversity, referred to as optimizing 
structural holes of social capital (Burt 1992). Essentially social capital is found in 
relationships - “who you know.” It is managed, and it aggregates from people to 
organizations and can be orchestrated to build an effective social structure and network. 
The model proposed in this dissertation addresses the node linkages of authors and 
corporate sources by using the joint entropy of Shannon allocated to performing nodes. 
While the models herein do not develop these details, the models have been developed to 
accommodate a structural hole analysis. The approach chosen enables later refinements 
as detailed node relationships are developed for lower level models, e.g. references cited 
or actual studies of message traffic to a receiver node. 

In competitiveness, or survival, social capital is organized naturally 
around the human behavior and the principle of least effort. In simple tenns, this 
principle of least effort says that a person solving the immediate problems will be viewed 
against the background of the person’s future problems, as estimated by the person. 
Moreover, the person will strive to solve the person’s problems in such a way as to 
minimize the total work that must be expended in solving both the person’s immediate 
problems and the person’s future problems. That in turn means that the person will strive 
to minimize the probable average rate of his work-expenditure (over time). And, in so 
doing he will be minimizing his effort (Zipf 1965, p. 1). 

In the area of software engineering, Boehm (Boehm 1989) developed a 
Theory W to help individuals and organizations to negotiate win-win conditions, given 

constraints and alternatives. Theory W is a management theory and approach which says 
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that making winners of the key stakeholders is a necessary and sufficient condition for an 
effort’s success. (Boehm 1998) First-hand experience by the Army (Saboe 2001a) over 
the last 10 years with the WinWin process model and tool, indicates that Theory W does 
provide a method for a group of individuals (and by extension this could be seen as 
representative of organizations) to analyze and act over a larger visible decision space 
when acquiring a software engineering process technology. This does enable the 
principle of least effort to be used in a group setting in a more quantitative fashion. 

The current research addresses minimum effort through the study of joint 
entropies in the model. Minimizing the rate of change of entropy, i.e. watching a 
technology mature, is something that can be observed in the model. On the prescriptive 
side, actions can be taken to get the technology to stabilize quicker. This is accomplished 
by investing in refinements, redundancy of the message set, propagation of the messages, 
increasing the number or quality (performance index) of nodes, and analyzing the effect 
on the entropy. Hence, the principle of least effort has a place in the model. With the 
foregoing, we are armed a qualitative discussion of the basics that influence technology 
transfer. The next section discusses an initial experiment for the software-engineering 
field to count messages following Rogers’ method. This experiment shaped the method 
that would be developed in this dissertation. Largely, these considerations led this 
research to a heuristic solution instead of a formal statement of the models. 


9. Experiment 0 “Count Every Message - Everywhere” 

The first experiment, which we refer to as experiment 0, starts to quantify these 
diffusion concepts for software engineering. The data resulting from the experiment is 
seen in Figure II-3. Figure II-3 illustrates the message-counting approach of Rogers for 
the technology. We have the number of messages published in a given year on the Y- 
axis, and time in years on the X-axis. Going from the lower to the upper curve follow. 
The lower curve, is marked with diamonds (0), Ph.D. Dissertations, Masters Thesis. The 
curve which is 2 nd from the bottom is next, and marked with squares (□), these are 
technical reports, proceedings and books. The third curve (2 nd from the top) marked with 
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triangles (A) represents articles. The top curve is marked with circles (O), these are also 
citations from applied science and engineering abstracts. 


“Software Engineering” Technology Diffusion 
Measured by "Messages" Generated (Saboe 2000) 



-♦ — PH.D and Master Thesis Worldwide n=628, yrs=30 —H— Books/Tech Proceedings n=5226, yrs=50 

^ '' Index of Articles n=3764, yrs=10, Journals Universe = 12500 — Applied Science and Engineering Abstracts n=1677, yrs=20 


Nov 2001 M Saboe 9 

Ph.D. Defense 2001 


Figure II-3 “Software Engineering” Messages Initial Data. 

(Source: Saboe 2000, 2001) 

The initial study, called experiment 0, evaluated the technology “Software 
Engineering” 2 to determine if indeed there was a better way to get a handle on measuring 
the maturation of technology. During this experiment, the effort looked at all print 
messages available. Software engineering “messages” were counted starting in 1968. 


2 The term software engineering was introduced in 1969, at a NATO conference in Garmisch 
(Redwine 1984). 
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The leading indicator messages appear to have grown out of graduate programs that 
performed research and published messages in the form of Master’s theses, and Ph.D. 
dissertations. Searching Dissertation Abstracts, 628 of these messages were found over a 
30-year period. Such messages also appeared in the fonn of books and technical 
proceedings. 5226 of these book/technical proceedings messages were found from a 
source going back 50 years. Messages in the form of articles in abstracted journals had a 
yield of 3764 messages, over a 10-year period, from a journal universe of 12,500 journal 
titles. Messages similar to these were searched in another source, the Applied Science 
and Engineering Abstracts. The result was 1677 messages over a 20-year period. This 
yielded the data shown in Figure II-3. The data for this chart is found in the appendix. 
This is a typical message-counting approach. Even when the data is not cumulative, we 
can see that there are general trends. 

We can make a few qualitative observations from the message-count data for 
software engineering. Looking at the messages published each year in Figure II-3, we 
get a sense of capacity. The research messages from the research institutions seem to be 
one of the limiting factors. Books and technical proceedings top out as well, also giving 
an indication of steady state capacity. Articles seem to be still growing. Articles are 
shorter and therefore more of an overview than the high-end messages in the form of a 
technical reports, or a thesis or dissertation. The capacity to produce these messages is 
not as limited. We know, for example, that many papers can come out of one in-depth 
Ph.D. dissertation. These high-end messages are where one would expect the new ideas 
to come from. Consulting with researchers, academics, and application developers, there 
is an intuitive feel that dissertations, thesis, reports and papers, —> mostly fuel new 
research and additional ideas (and create new companies). And that books, some papers 
—> fuel practical applications of research results. Books rarely have new ideas. They 
have mostly an educational function integrating and restating ideas from the other sources 
in a form that is accessible to a wider audience. Books also have a filtering function. 
Books select the most useful new ideas. An informal study done by Potter (Potter 2000) 
that traced the software engineering topics covered by all editions of Sommerville and 
Pressman (two popular software engineering texts) observed that as techniques got more 
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widely used, they were incorporated into the text. Some topics migrated from graduate 
level course to undergraduate courses, implying a more standard, less complex lexicon. 

It is easy to see that the capacity to produce high-end messages has stabilized. 
The academic research infrastructure is only capable of producing on the order of 100 
“new idea” messages per year. Producers of books and technical reports add another 300 
messages per year at capacity. While researchers producing high-end messages 
containing new infonnation are not the only source of new information, we see they have 
a capacity limit in the number of messages produced. The capacity limit is expected to 
change with the nodal learning curve rates. The mind share (similar to market share) 
fraction of capacity devoted to each subject changes more rapidly. This is visible in the 
three entropy models. We allocate learning on a per node basis and mind share is 
reflected in the number of nodes. Rogers attributed the rapid rates of adoption in a 
technology to more nodes. In order to build a nationally competitive infrastructure, these 
are the types of leverage points to which research managers and government policy 
makers need to have access. 

While this is interesting, the message-counting approach is limited in its analytical 
value. It is a very labor-intensive effort to count every message with minimum 
quantitative yield that would enable better-informed decisions for proactive actions. 

The idea to find a representative sample of messages for the technology under 
examination pointed to professional societies. While their databases would not cover 
every message, they would yield a rich enough source to potentially bear meaningful, 
statistically representative fruit. 

10. Crossing the Chasm (Moore 1991) 

Moore (Moore 1991) identified a chasm between the early adopters and early 
majority. Fissures were identified between the other adopter segments of the communiy. 
At least two factors contribute. First, the communication channel between the segments 
of the community may be non-existent or spotty. Second, if the communicaiton channel 
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existed and was established, there is an impedence mismatch between advocates and 
receptors in different communitiy segments. One could speculate that if a model was 
developed that included a notion of momentum, then conditons could be arranged so that 
enough momentum, with momentum developed from the entropy data, could enable 
“jumping” across the chasm and fissures due to potential and pressures. This notion of 
momentum is defferred in this dissertation to areas of future research, but the models 
developed may have a momentum property. 



Figure II-4 Crossing the Chasm (Moore 1991) 

11. States of Software Technology 

Redwine et. al. (Redwine 1984) studied 14 different cases in considerable detail. 
They identified 5 major phases, and 2 sub phases, in popularization (4a and 4b), that a 
technology passes through as it matures. Figure II-5 shows the states. While the analysis 
is extremely good for the cases studied, there is a bit of imprecision in states 4a and 4b, 
e.g. popularization throughout 40% and 70% of the community respectively. It is 


- 48 - 






extremely difficult to determine, based on their methods, how to identify what the 
quantity for the total community is. 

For example, citrus fruit was known to cure scurvy 200 years before the British 
merchant Navy adopted the practice. The Royal Navy, took 400 years to adopt the 
practice. One would think it was the same community. Yet, the Royal Navy could 
impress sailors and really wasn’t concerned about attrition, so their impetus to adopt was 
quite late. At the same time, the merchant navy had a different set of realities. By most 
standards, we would think of this as one community. In the software community, there is 
also a spectrum. The realities of resources constrained systems in the embedded world 
have kept that community from adopting techniques like CORBA in the general purpose 
processing world of management information systems. 

They also make a flat statement that a technology matures in about 1+1- years. It 
turns out that this is a very difficult statement to support. On the other hand, they 
identified several points where we can observe output. 

We can observe a report or paper (a message) that identifies when there is a 
problem that exists (phase 0). The observable facts in concept formulation (phase 1) are 
general publication (messages) of solutions to parts of the problem. Innovators, in 
Rogers’ tenns, would generally be found in phases 0 and 1. Clear definition of a solution 
via a seminal paper (a message), or demonstration system is the marker for the phase of 
development and extension (phase 2). While a demonstration system is generally 
documented in a paper or report, which we can count in the proposed method, a 
demonstrator is still a message. Internal enhancement and exploration illustrating usable 
capabilities which are available is a message (phase 3). In phases 2 and 3, you would 
expect to see the early adopters. When the technology is used outside the initial 
development group (phase 4), we see more observable messages. This is also where it 
moves to the broader consumer community. It is at phase 4 that the early majority and 
late majority are generally observed using the technology. 

Each of these observations can be viewed as a message. More particularly, these 
messages are reported in the literature, which is professionally indexed and abstracted. 
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During the validation of this research, data has been gathered on five of the 
fourteen technologies in addition to more current technologies. While there may be a 
method to map these state transition points that are clearly observable to entropy curve 
characteristics 3 , i.e. 1 st and 2 nd derivatives, as well as inflection points, stochastic 
dominance, etc., this has yet to be done. 


States of Software Technology Transition 
(Redwine 1984) 


b - throughout 70% of the community _ 

a -- throughout 40% of the community 
Popularization 

• appearance of production quality,supported versions 

• commercialization and marketing of the technology 

• propagation of the technology through a receptive community of users 

Substantial Evidence of Value and Applicability 

4 Enhancement and Exploration (external) 

• Same activities as \o\Enhancement and Exploration (internal )ut -- they are carried out by a broader 
group, including people that have not been involved in the technology maturation up to this point 


o 


i 


_ ShifttoUsageJDutside the Development Group _ 

3 Enhancement and Exploration (internal) 

• major extensions of the general approach to alternative problem domains 

• use of technology to solve real problems 

• stabilization and porting of the technology 

• development of training materials 

_ • derivations of results indicating value Usab f e capabilities Come A vailable _ 

Development and Extension 

• trial, preliminary use of the technology 

• clarification of the underlying ideas Clear Definition of a Solution Approach via a 

■ extension of the general approach to a broader solution Sem/ „ a/ p aper of a Demonstration System 


Concept Formulation 

•Informal circulation of ideas 
• convergence on a compatible set of ideas 

•gpnpral publication nf solutions tn parts nf thp prnhlpm 


Appearance of a key Idea underlying the technology 
_ or a clear articulation of the problem 


Basic Research 

• Investigation of ideas and concepts that prove fundamental to the technology 

• general recognition that a problem exists and discussion of its scope and nature 


Figure II-5. States of Software Technology Transition. (Source: Saboe2001, 

Redwine 1984) 


12. Software Technology Transition Framework, Advocate/Receptor 


The Software Engineering Institute has been the single most prolific source on the 
subject of software engineering technology transfer. This is readily understood since this 
Federally Funded Research and Development Center was established with a primary 
mission to establish transfer of software engineering technology to the Department of 
Defense. Fowler (Fowler 1994) developed a framework for technology transfer 

3 Any undergraduate calculus book tells us that setting the 1 st derivative equal to zero, determines 
whether a local maximum and minimum exist and the location. Setting the 2 nd derivative equal to zero 
identifies the inflection points. Those points and the 1 st and 2 nd derivatives show the characteristic of the 
curve. This shows how the slope changes, as well as how the curve bends upward or downward. 
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identifying advocates and receptors (change agents) mediating between producers and 
consumers (see Figure II-6). In this work, three life cycles of technology transition are 
presented: research and development, new product development, and implementation. 
Emphasis on the need for common terms between receptors, consumers, and researchers 
is identified as an important aspect of the SEI studies. This dissertation’s model accounts 
for this finding by examining the conditional probability, e.g. the input terms influencing 
the output terms (See 4. Conditional Entropy pi07). A clear signal, with minimum 
noise and need for requests for feedback, between a sender and receiver improves 
technology transfer. 


Software Technology Transition Framework 

Producer Consumer Model with Advocates and Receptors 

(Fowler 1994) 



Figure II-6. Software Technology Transition Framework. (Source: Fowler 

1991) 

This research does not address the lower level implementation details of that 
framework; rather it builds an analytical framework useful to determine probability of 
success and quantity and redundancy of messages that need to be sent as a clear signal. 


-51 - 









Significant additional work (Forrester 2000, Fowler 1992, Fowler 1992a, Fowler 
1990) has been developed at the SEI. This work primarily focuses on the lower level 
implementation details of the framework, e.g. methods on how to plan and effectively 
communicate technology to an organization. 

Saboe (Saboe 2001) has related the framework of Forrester to the early phases 
and state transition points of Redwine and Rogers (See Figure II-7). Producers are 
generally in the early phases (0-2) of Redwine’s model. Early adopters are in the phases 
from 1 to 3 of Redwine’s model. The early majority, and the consumer picks up from 
phase 3 through the late majority and other consumers of the technology. 


States and Producer Consumer 
Software Tech Tx Model (Saboe 2000) 



Substantial Evidence of Value and Applicability 


Shift to Usage Outside the Development ( 

Usable Capabilities Come Availat 
Clear Definition of a Solution Approach via 
Seminal Paper of a Demonstration S ystem 
Appearance of a key Idea underlying the technology 

or a clear articulation of the problem 

21 June 2001 M Saboe 

Monterey Workshop 20001 



Figure II-7. Mapping of the SEI Transition Framework and Redwine’s Stages. 

(Source: Saboe 2001) 


13. Thermodynamics Example in Technology Transition States 

Let’s review some of the history of where we are with regard to a technology that 
is very relevant to this research—Thermodynamics. As we know, the gestation period for 
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this “technology” took well over 100 years. Figure II-8 uses thennodynamics as an 
example technology mapped to the states as identified by Redwine. 

Thermodynamics can be defined as the science of energy. (£engel 1989, p.2). 
Energy is viewed as the capacity to do work or as the ability to cause changes. One of 
the most fundamental laws of nature is the principle of conservation of energy. This 
states that the total amount of energy is constant. Thermodynamics deals with conversion 
of energy from one form to another. It deals with properties of elements under study and 
the changes in those properties as the result of energy transfonnations and interactions. 
During an interaction, the energy in a system can change from one state to another. 
Thermodynamics defines a control volume, boundaries etc., that represent the system 
under study. It turns out that the principles of thermodynamics can be applied to any 
conserved property, e.g. energy, momentum, mass. This is now covered in many 
undergraduate texts on thennodynamics and physics (Fraundorf 2000). It is useful to 
apply these principles to infonnation as well. Either the information is conserved and 
useful, or noise (entropy). This section develops similar properties for software 
technology transfer. We call this Technology Transition Dynamics (TechTx Dynamics). 
The principles are constructed in such a manner to support extension to software 
development and software itself. 

It is useful to spot key points for the development of thennodynamics. It was not 
until 1700, when Newcomen and Savery were developing the steam engine, that the need 
arose for studying the problem. The first clear articulation of a problem was in 1700. 
(, StateO, "Clear Articulation of Problem”) The first seminal paper occurred in 1849. 
This is when Lord Kelvin published the term “thermodynamics”. (State 1, "Seminal 
Paper") Rankine published the first textbook ten years later, in 1859. (State2, "Usable 
Capabilities Come Available") Practical development (State4, "Substantial Evidence of 
Vcdue and Applicability") and is evidenced in the early 1900s. Gibbs, in 1902 with his 
"Elementary Principles of Statistical Mechanics", Fowler and Tolman in 1936 and 1938 
and their publications, "Statistical Mechanics" and "Principles of Statistical Mechanics" 
respectively. It can easily be argued that by 1953-54, thennodynamics had reached 
State4b, "Popularization Throughout 70 % of the Community". Popular texts by Shapiro, 
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"The Dynamics and Thermodynamics of Compressible Fluid Flow" and Lee, " Theory and 
Design of Steam and Gas Turbine Engines" saw widespread use for decades. 


Thermodynamics Technology State Transition 



Figure II-8 Thennodynamics Technology Transition State Example 


For the purposes of the domain of knowledge for software engineering 
(technology transition dynamics), the key Statel, "Seminal Paper" state transition point 
occurred with Shannon in 1948. Claude Shannon is considered the founder of 
information theory. He is regarded by some as a modern equivalent to Newton. Shannon 
picked up the thermodynamic notion of entropy and applied it initially to 
communications theory 4 . This theory is the underpinning of modern in formation theory. 
This can be seen in the top block of Figure II-9. 


work. 


4 


We saw a hint of the future in the 1959 statement by Lyapunov after noting Shannon’s 


“OnucaHHUie paooTi.i npeacTaBJunoT cooou nepBtie rnaru b oOJiacTH 
MaTeMaTuuecKHx oa/rav khocpiicthkh. Ohh ooueTuurenBi iickoch oomeu 
nanpaBuennocTMO 3a\n>icjiOB. KOTopyto movkiio xapaKTepn30BaTt> KaK nava.no pa3pa6oTKH 
oftmeit MeTpuuecKOH Tcopuu anropHTMOB hjih Tcopuu anropuTMOB c oueHKaMHio OunaKO 
nocTpociiue TaKoit Tcopuu sTBrutcTca eme ueuovi oyuymero”. 
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Software Technology Transition - State 

Example 



Saboe, Software Engineering 2001 


Prigogine Dynamical Systems, Information, Evolution 1980 

Kolmogorov Complexity 1964 
Jaynes, Information Theory, Statistical Mechanics, 1957 


Shannon, Communication Theory 1948 
Fowler, Tolman, Statistical Mechanics 1936,38 


Clausius, Entropy 1850, Boltzman 1860s, Gibbs 1902 
Substantial Evidence of Important Parallel developments, 

value and Applicability Bernoulli 1713, Bayes 1763 


Shift to Usage Outside the Development Group Practical Developments and 

use to solve real problems, 1900 


Usable Capabilities Come Available 


Rankine - First Textbook, 1859 


Clear Definition of a Solution Approach via a 
Seminal Paper of a Demonstration System 


Appearance of a key Idea underlying the technology 
or a clear articulation of the problem 


Newcomen, Savery- Steam Engine, 1700 


Lord Kelvin Paper - Thermodynamics, 1848 


Figure II-9 Software Technology Transfer - State Transition Example 


There are several tracks that finally converge to get us to the point of this 
research. Thermodynamics converges with information theory, and in this research, we 
tie together thermodynamics, information theory, control theory, dynamical systems, and 
learning curves with software engineering technology transfer. Later, we suggest that the 
nodes, arcs, and entropy measure are relevant to software development and software 
itself. 


One of the drawbacks of this view is that we know that everything listed in the 
upper area of Figure II-9, is primarily the result of work by investigators outside of the 
thermodynamics community. In the upper block, we have the convergence and mixing of 


“The efforts described here represent the first steps in the area mathematical problems in 
cybernetics. They are linked together by some common idea, which can be characterized as the starting 
point of the development of the common metric theory of algorithms, or the theory of algorithms with 
estimates. However, the development of this theory is still to be accomplished in the future”. (Halstead 
1977, p4, Translation Bankowski, 2001). We will run into Lyapunov again. 


-55 - 






























several threads of technologies from different domains. We see the probability work by 
Bernoulli and Bayes in statistics, which has its own set of state transitions. We also see 
thermodynamics and information theory as finally dynamical systems inspired by biology 
and evolution of life itself. 

Yet there is definitely a foundation laid by the thermodynamics work. On the 
other hand, if we start with Shannon, we can see a parallel set of states (shown on the 
right of the figure as local state transitions) using the thermodynamics foundation as an 
input. 

With his publication of "A Mathematical Theory of Communication" the 
discipline was provided a crystallizing and focusing seminal paper. The precursors to 
this at StateO, with the "Clear Articulation of the Problem and Appearance of Key Ideas 
Underlying the Technology" in communications, information and mathematical theory 
was the work by Bernoulli (1713), Bayes (1763), Gibbs (1902, 1928), Szilard (1929) 5 , 
von Neuman (1944), Kohnorogrov (1956), Jaynes (1957, 1957a), Kulch (1972), 
Uspensky (1992), and Li (1993). We consider the use of these developments in this 
research is, prima-facia, evidence that those technologies have had substantial evidence 
of value and they are being applied by an outside group - software engineering. 

14. Extension to Address Standardization Effects (Fichman 1993) 

Fichman and Kemerer (Fichman 1993) focused on organizational and 
community-wide technology adoption. They develop a two dimensional framework 
based on theories relating to organization and communities. They particularly bring the 
economics of standardization to the literature for the first time in the software 
engineering process technology literature. This work points out four economic factors 
affecting technology adoption: prior technology drag, irreversibility of investments, 
sponsorship, and expectations. These are summarized as follows: 


^ It is of interest to note that the Szilard engine was described in 1929 z.Phys. 53 (1929) p 840-856 
according to Zurek. (Zurek 1989). In this paper titled “On the decrease in entropy in a thermodynamic 
system by the intervention of intelligent beings” he discovered the relationship between information and 
entropy. 
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a. Prior Technology Drag 

A prior technology provides significant benefits because there is a large 
and mature installed base. The research model of this research enables the quantification 
and the detection of “pushback” by measures of the entropy, e.g. the tenns of the 
technology show up more and more in the community lexicon. The models proposed 
suggest that the more familiar the tenns, the less likely the technology will be resisted, or 
pushed back (Zipf s law of minimum effort 6 ) and the fewer requests for clarification will 
be required. This research is explicitly going to show the relationship of entropy, state 
transitions and frequency of performing a task (producing a message) (See 3. Two 

Dimensional Finite Difference Representation of S Ht , p 141. We know from other 

learning curve studies, the more times a task has been performed, the less time required 
to perfonn the task. This learning represents an increase in perfonnance efficiency. This 
too is closely related to the law of minimum effort. This research suggests that in the 
TechTx Basic Entropy model, the measure of entropy, as input, gives a synthetic metric 
for the technology drag. 

b. Irreversibility of Investments 

Adoption of the technology requires irreversible investments in areas such 
as products, training and accumulated project experience. In the section of the 
Introduction, 2. Context and Overview, p9 the flow of correlations yields 
irreversibility. For example, once the money is spent on a technology, it is gone. It can 
not be spent again. Another example is closer to the thermodynamic aspect of 
irreversibility. Once the community or a node in the community is exposed to a 
technology, you can not unexpose them. The future is influenced by that exposure to a 

6 Zipfs law of minimum effort is really a social structure representation of thermodynamic and 
Newtonian principles. “Which says to pass from the initial position [or state] occupied at instant t 0 , to the 
final position occupied at t h the system must describe a path that in the interval of time between the instant 
tO and tj, the mean value of the action - the difference between the two energies T [kinetic a function of 
mass and velocity] and U [potential energy depending only on the coordinates or structure] must remain as 
small as possible.” (Poincare 1903 p63). Similarly Bayes, using geometric methods, makes a similar 
argument but in terms of probability, expectations, and variance. (Bayes 1763) In the later case, Bayes, 
and former, Zipf, there is no reference to the materiel under examination. This reinforces that the principle 
is not limited to physics and can apply to information correlations as well. 
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product, training and prior experience. This dissertation prior experience, training and 
exposure through the entropy aspect of the model. In the control theory part of the 
model, the requests for feedback become less if the input messages represent well- 
understood messages by the resources and assets in the node. 

c. Sponsorship 

Fichman suggests that strong sponsorship seems be beneficial in moving a 
technology to standardization when a single entity (person, organization, consortium) 
exists to define the technology, set standards, subsidize early adopters, and otherwise 
promote adoption of the new technology. The models in this dissertation reflect that 
conjecture in two ways, one explicit and the other implicit. Explicitly, if the terms in use 
have been widely accepted as standard, this reduces the noise in the producer- (advocate- 
receptorj-consumer lexicon, increasing the mutual information used. This reduces the 
rate of change of the entropy. Also, large quantities with a limited amount of new tenns 
introduced published each year, would reflect sponsorship. Even if there were not a 
single entity with resources focused to promote the technology, the models would suggest 
that the technology is approaching stability, and converging. While the model does not 
address resources explicitly, the result of resource expenditure is seen in messages. A 
mass of messages with the same vocabulary reduces entropy, moving the vocabulary 
toward stability. Additional new messages with new terms in the vocabulary at a rate 
greater than the usage of the existing vocabulary retards the movement toward stability 
and convergence. Let’s look at a sponsor that is providing resources for a given 
technology. Researchers knowing that there is a customer will direct their efforts to 
producing messages in the desired technology’s lexicon. They are reacting to a potential. 
It takes time and effort to produce the messages. The more heads, resourced in a band, 
which address the technology issue, thanks to sponsorship, will yield greater message 
output. The change in entropy, as the result of new messages in the result of effort, 
implies resource consumption to produce the messages. The stability and convergence 
(i.e. decrease in the rate of change of entropy) suggest the lexicon is becoming 
standardized. This may be defacto. The vocabulary, communication network approach 
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and the change agent (sender - receiver) aspect of the model address this factor which 
was seen as desirable and identified by Fichman and Kemerer (Fichman 1993). More 
interesting is that by exercising the model by varying the number of performers in a band 
or mix of a portfolio of bands will affect the modeled output. This analysis would 
suggest the prescriptive remedy a program or research manager should take to reduce risk 
or accelerate the arrival of a technology. 

d. Expectations 

Technology benefits from an extended period of widespread expectations 
that it will be pervasively adopted in the future. This research sets up the ability to 
further analyze the notion of expectations and deals the expected value of terms in a 
technology (1. Entropy Review, p98). The inference is that the more likely that a set of 
terms is expected to be found related to a technology, the less uncertainty there is relative 
to those sets of terms and that subject technology. This reduces risk, and increases the 
probability of use - if the technology is useful for the problem at hand. However, this is 
the topic for further research as identified in the final sections. Work addressing 
mathematical concepts of momentum and potential can be developed based on the 
elements of the initial model. 

The work by Fichman and Kemerer also identifies attributes of 
innovations. Although Rogers addressed and identified five generic attributes of 
innovation (1) relative advantage, (2) compatibility, (3) complexity, (4) trialability, and 
(5) observability, his work is based mostly on study of individuals. Van de Ven (Van de 
Ven 1991) argues that these same innovation attributes play an important role in 
adoptions by organizations. The Rogers’ attributes have been generally adopted by the 
community. This appears to be due to familiarity (correlations of terms) with the 
attributes in the diffusion of innovations community. Others (Moore 1987), (Kwon 1987) 
use these as well. Alternate taxonomies show up in Leonard—Barton (Leonard-Barton 
1988). They identify transferability, organizational complexity, and divisibility. 
Pennings (Pennings 1987) identifies concreteness, divisibility and cost. Eveland and 

Toratzky (Eveland 1990) identify trialability, lumpiness, adaptability, degree of 
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packaging, and the “hardness” of the underlying science. Zelkowitz (Zelkowitz 1998) 
relates different styles to Rogers’ attributes and characteristics of the adopter type. In 
most cases, all of these can be mapped back to Rogers’ original attributes. 

This research was constructed to address Rogers’ compatibility, 
complexity, trialability and observability in terms of the entropy metric. The entropy, 
specifically conditional entropy, addresses complexity of a technology and expectation of 
adoption. The trialability is inferred from the production index of the number of 
observable messages produced (i.e. messages produced per time step by a node). This 
research explicitly models the notion of Rogers’ complexity as the entropy of the set of 
sets of terms that a node takes as input. This research also explicitly relates the 
production index and the input entropy intuitively this is related to trialability. The more 
the portfolio of nodes can produce per time step the more trials were performed (based on 
research task produced per researcher capita). This relative advantage is addressed only 
indirectly, but the mechanism is there to compare two or more competing technology 
entropy metric curves and to determine the rate of change, crossover, and probability of 
arrival of a technology’s maturity. Observability at the system level can also be seen in 
the selection of technologies studied. The data from the technologies studied pennit 
future spotting in of Redwine’s observable (first four) state transition points. This 
represents five of the fourteen technologies Redwine studied. It is premature to say that 
we can make any predictions by spotting observable points alone. However, future 
research could spot the observable events and attempt to correlate probability of success 
with the entropy metric. 


15. Diffusion/Infusion Issues (Zelkowitz 1995) 

Zelkowitz (Zelkowitz 1995, 1998) has extensive experience with infusing 
technology into organizations. Infusion is differentiated from diffusion as it relates to 
internal adoption by a particular target organization, while diffusion generally refers to 
movement of the technology to the broader user community in a macro sense. His study 
within NASA builds on the “experience factory” work with NASA’s Software 
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Engineering Lab and the experimental approaches of Basili (Basili 1994, 1994a). He 
studied the differences in the industry-wide phenomenon of a technology specifically 
focusing on the infusion process, which actually make the changes in the current state of 
technology. The TechTx Entropy Feedback model (pl48) provides a mechanism to 
address the infusion process in the transfer function. The function takes as input new 
messages and interactions resulting from feedback and produces output. Successfully 
retransmitted messages from a change agent (receptor) to a consumer represent infusion 
in a particular organization. The feedback model is abstract, but is constructed in a way 
to permit lower level, implementation details to be added which address infusion. The 
fraction of messages that need clarification, (/?) (introduced in the TechTx Basic Entropy 
Feedback model) in the feedback model, represents a kind of efficiency of the infusion 
process. The percentage (1 -jd) of the world messages related to the material is well 
understood in highly encrypted messages, and without a lot of noise, the technology is 
passed directly to the consumer. At the macro diffusion level, looking at the entropy rate 
of change for the ensemble of nodes, we see the associated clarification (j8’s) which give 
us the average rate for the request for feedback (lack of understanding) of a technology. 
This in turn can be fed to infusion, where the technology program manager and adopter 
organization can further study the details of the infusion process. Individual /3 values for 
an organization and a given technology can be measured, if it is so desired. 

16. Technology Transfer and the Learning Curve (Nishiyama 2000), 

(Hanakawa 1998) 

During infusion, there is evidence that the learning curve is in play. The skill 
level and the improvement in productivity due to the technology, productivity loss during 
transfer, and the combined effects, net gain (Nishiyama 2000). The learning curve 
impacts on assimilating a new technology into a project were seen by the number of tasks 
performed over a study of several projects (Hanakawa 1998). This study in software 
development and others suggest the learning curve of Newell and Rosenbloom (Newell 
1981) for power law chunking is appropriate for the various types of learning that need to 
be handled. This research looks at the learning curve as a local, process efficiency 
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function to refine the basic control model with the power law learning curve. This can be 
extended to a power law that uses the chunking model equations. While this is not 
important for the development of the basic model in this research, it provides the linkage 
to all manner of studies of organizational learning and ultimately, the breakeven and 
return on investment curves (Nishiyama 2000). This can be developed to make resource 
decisions, both for the infrastructure and for a specific research program or organization. 

There is a broad base of literature on learning curves. During the study for this 
research, a large number of papers were reviewed. (Anderson 1981, Guiliksen 1934, 
Knecht 1974, Langley 1981, Lewis 1981, Mazur 1978, Newell 1981, Nembhard 2000, 
Miller 1956, Vigil 1994, Yelle 1979) and many more. 

These papers developed the basic relationships from learning curves, through 
relevance to software engineering. Anderson (Anderson 1981) is from Carnegie Mellon 
University, and the book he compiled under NSF and DARPA funding has a strong bent 
to showing the relevance to software development. (Langley 1981), (Lewis 1981), 
(Newell 1981). Linkage to distributions of terms and statistics of language and Zipf s 
law for the principle of least effort, are connected through (Mandlebrot 1953), (Simon 
1955), (Snoddy 1926), and (Zipf 1949, 1965). 

17. Mapping of Motives of Actors (Pfleeger 1999) 

While the work by Pfleeger (Pfleeger 1999) never explicitly defines technology 
transfer, it provides the most comprehensive literature summary of the essential software 
technology literature. While not addressing all of the transfer field literature, or even all 
of the software technology studied in this area, the paper is an excellent review, a great 
overview and starting point. There are several key contributions beyond the survey of the 
field. She describes the process and roles involved in order to move technology in a 
transition from idea (technology creation) to adoption (technology diffusion). The 
generation of evidence, packaging, support and attention to the audience are identified as 
essential elements in the process of transfer. In this research, these characteristics are 
primarily addressed in the clarification ( J3) in the control model. The clarification (J3) 
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values are driven by the commonality of terms to the audience measures in terms of the 
frequencies and entropy metric. 


Pfleeger also maps the motivations of the adopters to the category of adopter 
(innovators, early adopters, etc. per Rogers 1983) (Table II-2). Also identified are the 
effects of rules imposed on an organization, a standards committee or a customer. These 
rules can encourage the success of a technology (this push or pull) when other models 
fail. For instance, she cites the effect of the Department of Defense’s endorsements of 
products, recommendations for process improvement, or mandatory rules about tools as a 
positive influence to encourage “laggards” to take risks and try new technologies. The 
successful technology requires not only a new idea, she claims, but also a receptive 
audience with a particular adoption style. The various models (people mover, 
communications, on the shelf, vendor and rule as introduced by Pfleeger) are mapped to 
the level of risk the adopter community is willing to take. 


Adopter Category 

Level of Risk 

Adopter Model 

Innovators 

Very High 

People-mover model 

Early adopter 

High 

Communication model 

Early Majority 

Moderate 

On-the-shelf model 

Late Majority 

Low 

Vendor model 

Laggards 

Very Low 

Rule model 


Table II-2 Relationships among Adopters, Risk and likely Transfer Model. 


(Source: Pfleeger 1999) 

So to reduce the impedance mismatch between researcher and the method of 
moving the technology, “message” has to be matched with the audience. While Pfleeger 
cites Zelkowitz and other studies that look at the actual implementation details of the 
transfer process, it is useful to note the factors that affect clarification requests (J3) in this 
research. Another way to view the stream of messages is to suggest all that does not 
move to the consumer is in the feedback-entropy streams. Pfleeger, Zelkowitz, the SEI 
and others generally are looking at the implementation details of technology transfer. All 

of the research to date generally looks at technology transfer from this perspective. This 
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research addresses a macro process, useful to the research manager and program 
managers, to assess the risk of the technology maturing at a given time. Implementation 
in a specific program of a technology should try to minimize the clarification requests 
(J3). Using messages that are matched for the audience minimizes the mismatch. The 
message is packaging of the evidence. Pfleeger (Pfleeger 1999) and Schum (Schum 
1994) describe evidence. 


Types of Evidence 

Characteristics 

Tangible 

Objects 

Documents 

Images 

Measurements 

Charts 

Relationships 

Testimonial (unequivocal) 

Direct Observations 

Second-hand 

Opinion 

Testimonial (equivocal) 

Complete equivocation 

Probabilistic argument 

Missing tangibles or testimony 

Contradictory data 

Partial data 

Authoritative records or facts 

Legal documents 

Census data 


Table II-4. Messages in Forms of Evidence. 
(Source: After Schum 1994, Pfleeger 1999) 


Schum presents the categories of evidence seen in Table II-4. The specific 
observational sense, objectivity and veracity of the message enable decisions to adopt or 
not adopt. In terms of this dissertation, if message is clear, unambiguous, and well 
understood, the advocate can pass on the message to the receptor with little to no requests 
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for feedback. Schum and Pfleeger argue for this packaging of the message. This 
research supports those observations with the Shannon entropy component where noise 
and non-signal are minimized, e.g. the vocabulary converges between advocate and 
receptor. 

In this research, we consider the message as representative of the evidence. The 
risk is related to how often the tenns in the message are expected to be used together by 
the advocate (publisher of the message) and receptor (consumer of the message). For 
example, we regularly read papers that give messages representing evidence that a 
subject technology combined with some other characteristic associated with the 
technology which was used, examined, etc. The more frequently we these pairs of terms 
characterizing the use, examination of the technology, the more likely we would expect 
to see this combination in the future. 

Let’s consider a message representing evidence as a set of terms, for example the 
set of terms {}, {A}, {B}, {C}, might be a message about technology {A} with 
technologies {B} and {C}. The {} represents a null set in this alphabet for completeness. 
We will see papers, which are a way of transmitting a message, where there are 
combinations of these terms used. This alphabet can become a type of artificial language, 
with various combinations of the terms. Potential single combinations are shown in 
Figure II-10. Sub totals for q-level = 2 is seen as equal to 6, and q-level =3 is equal to 1. 
For q-level = 1, all of the combinations are the same, while the count is equal to three, 
there is really only one possibility, null. We will find that we can not count three 
instances of null, nor can we count one instance of nothing. 
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Terms 

{} 

A 

B 

C 

{} 

0 

1 

1 

1 


q=2 

q=2 

q=2 



{} 

A 

B 

C 

AA 

0 

0 

0 

0 

AB 

0 

0 

0 

1 

BC 

0 

0 

0 

0 


A 

0 

0 

1 

1 

B 

0 

0 

0 

1 

C 

0 

0 

0 

0 


Figure II-10 Potential Single Combinations. 


3 

2 

1 

0 

6 


1 

0 

1 

7 


In the first section labeled q-1, we see pairings of the null set and the single terms. 
This yields a set of sets of singles which are possible to be found. In the second group, 
q=2, we see the pairings of the single set from q=l with the primitive set terms. The “o” 
indicates we are not counting this combination because it is not unique and has been 
counted already. For example, {AA} tells us we are counting {AA} if {A} appears 
twice. Similarly {BB}, {CC}, or even {{}{}} should we want to count all of the 
combinations of nulls. In our case, counting the number of nulls, where a term in a set, 
e.g. {A}, appears twice will be redundant, more on that in the next section. At level q=3, 
we are perfonning the same type of binary combination. In this case, we are combining 
the results of level q=2 with the basic set again. We can see that the chance of finding 
{ABC} is one out of seven. 

This can be represented in terms of bits, and breaks. ••• | ••• | • | Each group of 
• | ’s represents a possible accessible q-level. A set of terms with {A}, {AB} and nothing 
in q=3 would be written • | • | |, where the double || indicates a null combination is 
present. This set of sequences can ultimately be represented as a program. The 
complexity of that program can be represented by Kolmogorov’s algorithmic complexity, 
which is essentially Shannon’s (average) information entropy plus a constant. 
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Each of these subsets represents a possible way that a researcher may find this 
message. Often, as we know, we use only elements of some research, that is single or 
double or more sets of terms. Each of these are legitimate accessible states of the 
message. The higher q-level tenns can be viewed as higher level concepts. You can see 
by inspection that it is not possible using this approach to have a q=3 term without filling 
the lower level q-level states. At some point the combination of a q=2 or q=3 set of terms 
can take on meaning as a primitive terms in and of itself. These higher q-level sets take 
on the meaning of a higher level of abstraction. They can then be considered 
representations of a concept, which may be replaced by new single tenns. 

At that point, it becomes a q=l set. We would expect that the higher level q sets 
will exhaust when they become more and more frequently used. This seems to be 
consistent with the abstraction discussion (Whitehead 1910), and learning models 
(Newell 1980) (See 11. Abstraction, p91, and 10. Learning Curves, p90 and 
Chapter III). Shannon (1948) illustrated this using a telegraphy notation where, a birth or 
death was simply represented by a few terms. The receiver understood that those few 
terms implied that a baby boy was born on a certain date, and other appropriate details. 
We do the same thing when we learn. We follow an economy of symbols and the 
principle of least effort (Zipf 1949) discussed earlier. 

We could look at the entire message of the publication (the article or report), and 
we could, in fact, look at every term in the publication and detennine the frequency of 
occurrence of the set of sets of terms. If we were looking at every term in a message or 
report, we could also populate the lower half of the matrix. This would permit the 
detennination that {AB} was different from {BA}, because {AB} has A preceding B and 
the reverse is true in the case of {BA}. This is how the analysis would be done for a free 
text study. 

For the purposes of this experiment, we are using a bibliographic record, and only 
examining the descriptors. In a descriptor field, we would not expect the term to be 
entered more than once, and the order is not significant in the data source used in this 
study. We do assume that the tenns in the descriptor field are representative of the topics 

covered in a message. Further, that the message terms in this field are symbolic of the 

-67- 



topic (technology) being discussed. This is reasonable, since this is the reason an 
organization like the Institute of Electrical and Electronic Engineers indexes material in 
this way. Similarly the Library of Congress, catalogs by index terms that are 
representative of the document. It would make no sense to go through the trouble of 
indexing material with the irrelevant terms when the mission is to make the knowledge 
available. The further assumption is that the frequency of occurrence of terms is 
representative of the attention the subject matter is getting. 

Imagine a world where the language was represented by a data set with a 1000 
records (messages). Of these 997 records contained only tenn {A} while the remaining 
records contained the tenns {AB}, {AC} and {ABC} we would expect that this world 
was generally concerned with term {A}. We can quantify this in a probability. The 
probability of finding the single tenn {A} is .997, while finding {ABC} is .001, or a one 
in a thousand chance - not too likely, but possible. We can calibrate an expectation of 
finding a term or particular combination of terms for these probabilities. This ability to 
quantify the probability of finding terms, predicting the future based on the present 
expectations from distributions is key to the approach used in this research. 

The context gives us a degree of insight into the likelihood that there will be more 
of the same to follow. Building from the needs of cryptography, this is where the 
fundamental work of Shannon (1948) started. He illuminated the way to make decisions 
based on these probabilities, using the concept of entropy. One can even represent the 
number of decisions needed for absolute certainty. This research launches from this 
point. 
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B. TECHNOLOGY TRANSITION: ANNOTATED BIBLIOGRAPHY 

There are many studies that have driven down into the implementation details of 
technology diffusion and infusion. This section presents a survey of the relevant 
technology transition literature that supports the development of the model presented in 
this dissertation. This section also provides a link to the implementation studies that are 
available to date, so that the model can benefit from organizational and technology J3’s 
unique to a local study. 

The appendix of this dissertation contains an annotated bibliography in two parts. 
The first includes the basic work done by the SEI (Przybylinski 1988). The second part 
resulting from this dissertation research is the addition to that work and brings it up to 
date with a large number of newer citations. 

Many of these have been annotated with an abstract. In many cases, e.g. SEI 
edited proceedings of the International Federation of Information Processing Technical 
Committee 8 (TC8) Working Conference on Diffusion and Implementation of 
Infonnation Technology (Levine 1994), the annotated bibliography of each of the key 
papers is included. With one exception, the key software technology transfer research 
papers referenced in this section include the papers cited in those papers. This provides 
an excellent starting point for future research. The exception is Rogers 1983, 1995 which 
has twenty four (24) pages of references (2700 references by Przybylinski’s count). 
These references of Rogers represent the most important work on the broad topic 
“diffusion of information”, according to Rogers (Rogers 1983, p. 414). The SEI will 
soon be publishing this annotated bibliography, which includes the sites to the material 
that each bibliographic citation references (Saboe 2001b). 


C. STATISTICAL ELEMENTS OF THE TECHNOLOGY TRANSITION 
MODELS 

This section covers the definition of terms as used in this research. The use of 
tenns and aspects that factor into the development of the proposed technology transition 
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models is developed. The historical basis for the use and the thread of connection to the 
current research and between probability, information, and uncertainty is described. A 
general discussion of these tenns in the context of infonnation-communication theory is 
presented. The notion of entropy both in information theory, statistical mechanics and as 
used in thermodynamics is introduced. We discuss the stochastic models and the 
relationship to a dynamical system model. Elements introduced here are known in the 
literature, and are accepted with out the need to prove them. This section sets a context 
and a point of departure for Chapter III. 

1. Probability 

What is this technology transfer engine? Is it deterministic? Is it probabilistic? 
While it may appear that we could have non-detenninism here, this is not the case. If we 
could know all of inputs, there is a deterministic relationship, however, it impossible to 
know all of the inputs. So, we must distinguish between non-detenninistic and 
probabilistic. We simple don’t have enough infonnation to accurately predict the result. 
This is because there are uncertainties in input to the technology transition process. There 
is a spectrum of distributed inputs, and this feeds a deterministic flow of correlations 
ordered in time. These are due to deterministic interactions, which yields a result. 

As indicated before, we have a irreversible flow of correlations that are ordered in 
time just as there is a flow of communication in society. This leads to an equilibrium 
solution if we have a technology that is stabilizing. There is a distribution of the input 
variables at work, all with probabilities attached. This affects the likelihood of 
discovering, extending and refining a technology, re-transmitting the technology and 
acceptance of a technology. We are dealing with probability, uncertainty and risk. While 
risk can be defined as the product of the probability of an event and cost of the event, for 
our purposes, we deal with uncertainty and risk as the same thing. We ignore the cost 
component in this development. In a real program office, the cost elements can be later 
added in to perfonn trades and risk assessments. It does not matter, whether an 
"objective" classification is or is not possible. We deal with a "subjective" probability 

concept. (Hirshliefer 1992, p. 10, and Savage 1954). 
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2. Information, Uncertainty 

Information is a difference in matter-energy [change of status - i.e. state] that 
affects the uncertainty in a situation where a choice exists among a set of alternatives 
(Rogers Kincaid 1981). "Information is something which reduces uncertainty. 
Communication is exchange of information.” (Wiio 1980, p. 18) Infonnation is the 
ability to choose between alternatives reliably. Before you send me an email, I cannot 
reliably, guess your message. After I receive it, I can do so. I have gained information 
(www.aip.org). 

Uncertainty is the degree to which a number of alternatives, the multiplicity of 
options, are perceived with respect to the occurrence of the event and the relative 
probability of the outcomes. Uncertainty implies a lack of predictability, of structure and 
/or information. This multiplicity of option states can be quantified in terms of entropy. 

Entropy and uncertainty can be considered synonymous (Jaynes 1957). Jaynes 
made the linkage between statistical mechanics as we kn ow it from (Gibbs 1903), and 
entropy as we know it is thermodynamics, by relating a common concept to both - 
maximum entropy. Mathematically, maximum entropy has the important property that 
no possibility is ignored. It assigns positive weight to every possible situation that is not 
absolutely excluded from the information. It is the state where we can deal with 
equilibrium properties. According to Jaynes, this is quite similar to an ergodic property. 

The macro equilibrium state of a system (this is what we see in classical 
thennodynamics), is the macro equilibrium entropy, S. From Boltzmann, we get 
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S = kP ({ Pi }) 


( 2 . 2 ) 


This is when the maximum value P of the statistical entropy functional P ({p,}) 
through the Boltzmann constant 7 k. Where P ({p,})= In Q. is the uncertainty. Where k 

for {nats, bits, bytes, or Joules/ 0 Kelvin} is {1, —^—,—-—,1.38X10 23 } respectively. 

In 2 In 256 

We can convert the natural log, In, to log 2 easily. 

lo §2 x = ( 2 - 3 ) 
m2 

The probability distribution {p} is on the set of available microstates f2={i} or 
multiplicity. The functional S=kP ({p,}) needs to satisfy two general properties, (i) P 
must be positive, taking the value zero only in the case of absolute certainty (p, = 0 for all 
states, except for a given state j for which p, = 1). (ii) P must increase monotonically 
with increasing uncertainty. In addition, a third condition is required, (iii) The P is 
additive for independent sources of uncertainty (Bayes 1763), (Planes 2002). Because of 
this, we have the property of extensibility. This means if you add or subtract these 
quantities which contribute to uncertainty, the system size - the extent — changes. 
Adding these quantities requires a product of the probabilities. 

We can compose a system like this, with a system composed of two subsystems 
which are independent, A and B, so that the set of microstates is C2 A+B = E2 A x £2 H . Each 
microstate (IJ) can be specified by fixing a state is £2 a of subsystem A and a state je£2 B of 
subsystem B. If a probability density, p 3+ f = pf p'. , then P ' 1 11 = P 1 + P B . (Planes 
2002), (Munster 1969). 

p ({P i }) = -Y J Pi l °g2(Pi) ( 2 - 4 ) 

ien 

3. Extensive and Intensive Properties 

Extensive properties in the physical world are volume, mass, particles, energy, 
money, messages, records, etc. Intensive properties (e.g. pressure and temperature) on 


7 Shannon (1948) quickly points out that k is just a convenient constant to relate to our physical world. 
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the other hand are independent of the size of the system. A method to determine whether 
a property is extensive or intensive is to divide the system into two equal parts with a 
partition. Each part will have the same value for the intensive properties, but half for the 
extensive properties. Examples of extensive and intensive properties are given in Figure 
11 - 11 . 


Extensive and Intensive 
Properties 



Extensive changes with the extent or size of the system 
Intensive properties are not affected by the system size 


Examples: 

Extensive:mass, volume, energy, money, messages 
Intensive: temperature, pressure 

Figure II-11 Extensive and Intensive properties 

It would be valuable to identify analogous extensive and intensive properties in 
the technology transition model, or in or general terms. 
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Property 

Ext¬ 

ensive 

Int¬ 

ensive 

Thermodynamics 

Physical 

Tech Transfer/ 
Information 
Communication System 

Particle Mass 

X 


• N particles per 
mole 

• Unit of entities, e.g. 

Term per some standard 
message length 

Volume 

X 


• L 3 (length 3 ) or 

• AL (Area * 
length) 

• V *5 

nodes consisting of 
authors * state change 

Energy 

X 


• eV, Joules, 
BTU’s 

• Some conserved property 

• Messages, tenns 

Temperature 


X 

• °K degrees 
kelvin 

• Some measure of change 
is cardinal, related to two 
variables ext and or int 

Entropy 


X 

• S>0 

• S=kP({pi}) 

• S = kin W 

• Always 
increases 

• Additive for 
Independent 
Identical 
Distributions 

• Similarly defined for 
information (Shannon 
1948) 

• S=kP ({pt}) 

• S=- Lpi log 2 A 

• Maximum entropy - 
uniformly distributed 
probabilities, same as 
thennodynamics 

Pressure 


X 

• Force per Area 

• Messages per node 

Density 


X 

• Extensive 
property per 
volume 

• Messages per v nodes 
(sum of v authors) 


Table II-6 Property Relationships 


Particles are analogous to sets of terms in a message. A message is made up of 

sets of terms. Counting all of the sets of terms is the same as determining the number of 

entities, particles. Just like in molecules, some entities have more weight than others. If 

all null and single term sets have the same weight, the analogy is a set of sets of terms 

e.g. {}, {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC}. {A} is “lighter” than {AC} which 
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is a composite of two if a term is made up of {A}+{C}. There should be some 
relationship between changing the status of a tenn and analogous principles in the 
physical world, e. g. Newton’s laws (see the next section). 

Volume in the physical world, is in three dimensions measured in some length 
units. We can get a volume with units of l by measuring the volume. Integration over 
small ell is used in continuous space. For a discrete system, we count the points defined 
in phase space. For the models, this volume is defined in only two dimensions, nodes (a 
publisher) and state points. This is discussed in further detail in Chapter III, on page 444. 

In a classical thennodynamics model, energy is measured in Joules, or BTU. It is 
often convenient to measure energy units in electron volts, which is the kinetic energy of 
an electron that has been accelerated through a voltage difference of one volt. This is 
moving an electron from its status at point A to point B. This is directly related to the 
conservation principle, the 1 st law of thermodynamics, and Newton’s 3 rd law. The first 
law of thermodynamics says that energy is conserved and transformed. Energy is a 
primitive and essential thermodynamic function. It is a mathematical abstraction. 
(Abbott 1989, pi). Newton’s 2 nd and 3 rd laws similarly constructed using the principle of 
conservation. 

Law 1 “Every body preserves in its state of being at rest or moving uniformly 
straight forward except insofar as it compelled to change its state by forces impressed. ” 

Law 2 “A change in motion is proportional to the motive force impressed and 
takes place along a straight line in which a force is impressed. ” 

Law 3 “To any action [change of state] there is always an opposite and equal 
reaction; in other words, the actions of two bodies upon each other are always equal and 
always opposite in direction” 8 . (Newton 1726, p417). 

Newton says in definition 3 of law 1, “because of inertia of matter, it is only with 
difficulty put out of its state either of resting or of moving.” In Newton’s interleaved 
copy of edition 2, he adds the following which was never printed: “I do not mean 

^ This is the exact statement taken from Newton’s original work. Modern texts have often changed the 
wording slightly on each of his laws, but the original statements give us the closer intent of the law to this 


-75 - 



Kepler’s force of inertia, by which bodies are moved toward rest, but a force of 
remaining in the same state either of resting or moving.” (Newton 1726 p404). Change 
of state, or status, must overcome some inertia. E.g. changing votovi meaning to change 
from an initial state, say a velocity, to a new velocity. Even to change one orientation of 
one atom, or one bit, such a change of state, takes some force or stimulus. Something 
must happen to change the state of information otherwise it stays in its current state. 

In Figure 11-12 below, we show the relationship using a Venn diagram, that shows 
the probability of two sets can represent this conservation through correlations of 
extensive properties at the intersection consisting of mutual information. The left hand 
subsystem A is composed of the sum of the uncorrelated part P (A\B), plus the correlated 
part I(A;B) still equal to the total and the P (A), where f(A;B) is the shared mutual 
information. This is the equal and opposite amount required by the 2 nd and 3 ld laws of 
Newton. Similarly, the right hand subsystem B is composed of the sum of the 
uncorrelated part P (B I A), plus the correlated part I(A;B) which is still equal to the total 
and the P ( B ). Looking at relation 4, 1(A;B)=I(B;A) and other relations in Figure 11-12, 
we see how the conservation principle is realized. The key is not conservation of energy 
in this research, but rather the conservation of the correlated components of extensive 
properties in two interacting subsystems (Planes 2002). What one subset loses, the other 
gains. 
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Mutual Information and Conservation of 
Extensive Properties 

p A+B = p A + p B (f) 

l(A;B) = P (B) - P (B\A) (2) 

l(A;B) = P(A)+P(B)-P(A,B) (3) 

l(A;B) = l(B;A) (4) 



Figure 11-12 Mutual Information and Conservation of Extensive Properties 
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4. System, Control Volume, System Boundaries, States 

The research refers to system, control volume, system boundary, states, in the 
usual ways. The application of the conservation principle requires a system and 
surroundings defined as a discrete portion of the universe. A system is any object, any 
quantity of matter, any region of space, etc. selected for study and set apart (mentally) 
from everything else, which then becomes the surroundings. The systems we are 
interested in are finite. There are two points of view, macroscopic and microscopic. 
Macroscopic takes into account the coarse characteristics of the system with intensive 
properties regarded as state space coordinates for example a T-S (temperature-entropy) 
diagram shown in Figure 11-13, shows a third intensive variable P, pressure. Figure 11-14 
shows a typical P-V (pressure-volume) diagram for the same cycle. 

In thermodynamics, there are the concepts of Q, heat, and companion quantities, 
W, and H, mechanical work and enthalpy, which are convenient mathematical concepts 
respectively. These are related to an internal energy U. U is a function of the internal, 
microstates discussed before. AU=Q-W. The change in the internal energy A U is the 
difference between the energy put in as heat Q (some stimuli input), and W useful wok 
out 9 . In our model, we are stimulating researchers and developers to produce messages 
that are used. Those that are generated, but not used are wasted. This is related to the 
system efficiency 10 r/. 

In differential fonn, AU=Q-W is written 
dU=SQ-SW (2.5) 

All energy exchange with the surroundings, in this case, serves to just change the 
internal energy. If in addition the process is adiabatic i.e. no heat transfer with the 
surroundings), then Q=0, and this becomes 

9 We have left out physical energy terms relating to the physical analogs for kinetic and potential 
energy at the system level. Even in thermodynamic analysis, most common problems do not need these 
energy quantities. 

10 The word efficiency comes from the Latin “efficax”=effect. In mechanical efficiency, all we are 
interested in is the effect, “work”. In every other kind of efficiency, we take the ratio of AE(x), “energy 
change” actually used to obtain the effect x to free “energy” AF, released (applied) to obtain the effect. 
Tj(x)= AE(x)/AF. or rj(x)= specified_output_”energy”_change / inputenergy ” change. 
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dU = -SW (adiabatic) (2.6) 

This says that for a system changed adiabaticly from one equilibrium state to 
another, the work should be independent of path. Or AU should depend only on the end 
states. This research will explore that relationship in an example in Chapter III. 

In the case of this research, the relationship between work and internal free 
“energy” states is dealt with as a potential. In statistical mechanics, thermodynamics and 
technology transfer dynamics, there is a potential that is the difference between the macro 
state when the system is at equilibrium, and the current state of the system. We see this 
because in statistical mechanics, the macroscopic property view is based on microscopic 
principles. E.g. equal probability of microstates gives the macroscopic description. 

This potential is realized in a manner similar to the general Massieu-Planck 
generalized ensemble potential functions (Munster 1969 Chapter III) and (Planes 2002). 
We hinge on Jaynes (1957) relationship that linked Gibbs thennodynamics and statistical 
mechanics to infonnation theory. Accepting that, we can have available to us the Gibbs 
postulate that the quantities calculated by thennodynamics are identical to those 
calculated by statistical mechanics. In our case, we can indicate the probability that a 
tenn is in state j as Pj=N/N, where Nj is the number of terms in state j, and N is the total 
number of terms in the system. Similarly we can determine the distribution function for 
velocity P(v). 

So, we can use the concept of ensemble potentials if we are careful about the 
conditions to define equilibrium. 

These potential relations are independent of the conserved quantity, they simply 
relate a current state to an equilibrium state. From these relations, we readily see that the 
free energy F=U-TS. Where U is the internal organizational energy of the system, T and 
S are the temperature and entropy of the system. S represents a systems present 
organization. Only part of the system’s total potential is locked up in the present 
organization. The rest of the accessible “energy” states are “free” from the current 
organizational constraint. Gibbs (1906) gave us this for physical systems. Maxwell tells 
us that 
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( 2 . 7 ) 


T a<KE> a <v 2 > 

This says, the higher the absolute average velocity, the higher the temperature. 
In Chapter III, we will present a method for interpreting the velocity (rate of change of a 
state). This coupled with the partition function can get us to the proportionality constant 
of temperature. 


State Space Diagram 
Intensive Properties Temperature, Entropy 


out 


dU= SQ- SW 


Figure 11-13 Intensive Properties State Space Diagram 
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State Space Diagram 
Extensive (V) Intensive Pressure (P), with 
Isentrops (S) 



Figure 11-14 State Space P-V (Pressure-Volume) Diagram 


The microscopic view addresses the internal structure and details of the system in 
a series of canonical 11 decompositions. Microstates represent these internal structural 
details and properties. U, internal energy can be related to the multiplicity of microstates 
(Schroeder 2000), (Planes 2002), (Munster 1969). To specify the microstate £2 of a 
system you must specify the state of each individual entity. If we specify the state more 
generally, by saying how many are in a given state, we are referring to a macrostate. The 
number of microstates corresponding to a given macrostate is called the multiplicity of 
that macrostate (Schroeder 2000). For example, assume there are 100 types of coins (an 
alphabet of 100). The total number of microstates is 2 100 , since each of the coins has two 
possible states. The total number of macrostates is only 101: 0 heads, 1 head, 2heads,... 
up to 100 heads. There are N coins (in this case 100 different of coins), the multiplicity 
of the macrostate with n heads is 

11 Canonical means broken down into finite primitive arrangements. It comes from religious heritage 
when the Catholic church laid down cannon law. Meaning the variable conform to a scheme that is both 
simple and clear. 
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( 2 . 8 ) 


&(N,n) 


N\ 


nl(N -n)\ 


r N A 


\ n j 


or C 


n 

N 


The last expression is the standard abbreviation for the quantity of 
combinations C n N of n items chosen out of N. So if we have one each of coins A, B, C ... 
100 in this example we have an equal probability when all of the combinations exist 
once. If we have multiple (A, B and C) coins and only one of the rest of the types of 
coins, we will have a biased set of combinations which are possible, and the equilibrium 
is skewed. This research will show that we get a skewed distribution that is biased 
toward pairs and triple sets of tenns (possible combinations of primitive message sets) 
take the fonn of Boltzmann’s distribution. What is obvious is that it is VERY unlikely 
that we find combinations outside of the most likely states. (Schroeder 2000 Chapter 3), 
(Nash 1972, pl2), (Castle 1965 p99). Figure 11-15 shows the possibilities for an alphabet 
N of 128 single different types of coins. This is really a VERY, VERY tall skinny 
distribution. The confidence limits around the mean (the peak) is represented by 

1/Vl0^ = +/-2xl0' 19 . The number of microstates associated with each of the (N+l) 
configurations is always calculable. While we can always calculate the number of 
microstates, and although always imaginable, as feasible in principle, we find that some 
trial and error possibilities are wholly impossible in any reasonable time, especially when 
the number of combinations is many orders of magnitude. Therefore, we have available 
to us the tools of differential calculus and we measure some experimental data. We see 
that the predominant number of configurations corresponds to the peak of the curve, 
where the tangent line must lie horizontally. 
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Alphabet N= 128 


3.0E+37 
2.5E+37 
2.0E+37 
1.5E+37 
1 .OE+37 
5.0E+36 

O.OE+OO 4 ^ -r- 

0 50 100 


~Xf\ 

Figure 11-15 Note distribution of configurations the Y axis is on the order of 10 

So the criteria for the predominant configuration is simply that df2/dX =0, where 
dX denotes a change from the predominant configuration to another configuration only 
“infinitesimally” different from it. The change in d£2 and dX is not infinitesimal in the 
absolute case, but differential calculus demands that changes be infinitesimal in the 
relative case. This condition is met with even 10,000 or 100,000 units of multiplicity and 
only a dozen quanta of “energy” states. For a sufficiently large assembly, we can regard 
Q. as an effectively continuous function of the configuration index X which we refer to as 
q-levels in this research. So we need not be reluctant to use the criteria that dC2/dX=0 to 
identify the predominant configuration. This follows from any development of quantum 
statistical mechanics (Schroeder 2000), (Nash 1972), (Castle 1965). 

There is a fundamental assumption in statistical mechanics that in an isolated 
system, all accessible microstates are equally probable. When two systems are contact, 
for example, a system and the surroundings, we are equally likely to find the combined 
system in any of its accessible microstates. So, we can always compare a distribution 
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with the maximum entropy, minimum extensive property distributions using relative 
entropy. It turns out that at equilibrium, the configuration of an isolated macroscopic 
system ensemble is typically that described by the Boltzmann distribution laws. (Nash 
1972 p25). If we are seeing a Boltzmann distribution and a number of criteria are met, 
we can estimate the probability of choosing n items out of N. Since there are many 
possible distributions, to know which ones are right in our case, we measure them. We 
count the message subsets of N and Nj in our subsystems and super system. Then we also 
satisfy two conditions. 

1) The number of messages N in the super-system consisting of subsystem A and 

subsystem B is constant. N = ^ AT 

i 

2) The “energy” states of the super-system is constant E = ' s f'E j N j 

j 

Where EjNj is the “energy” state of the j th level, in the canonical ensemble, we can 
let E be calculated from the statistical mechanics E = p.E j . Then we get E = NE 

i 

The 0 th law of thermodynamics pennits comparison of two systems if they are in 
equilibrium with each other. Imagine the two systems are a subsystem and a reservoir. 
This arrangement is essentially, how a thermometer works. When the subsystem comes 
into equilibrium with the reservoir via energy exchange, the controlling variable on the 
mean “energy” states is T, the temperature. This is the resulting equation form Helmholtz 
Free Energy F=U-TS with the logarithm of the partition function Q c for the case of a 
Canonical ensemble. 

F(T,V,N) = -kT In Q c (2.9) 

_ e i 

n c = X e" (2.10) 

isO. VN 

where the available microstates are fixed in V, volume (i.e. the number of nodes), 
and N (i.e. the number messages built from the number, n, of tenns - primitive 
messages). In a macroscopic view, we allow the exchange of energy by fixing the mean 
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energy and mean number of messages. In these ensembles, the determinations of the 
intensive and extensive variables are usually taken as natural variables of energetic 
potential. This is different, and contrasts, to the microcanonical ensemble, where the 
entropy is taken as the relevant potential. Therefore the basic equations of such 
ensembles are different and the equations above for the average values and fluctuations 
of the average values are also different (Planes 2002). 

5. State Equations 

There are two types of problems, we would like to be able to solve. The first 
deals with processes, and the equations used deals with properties relating property 
changes of a system and the quantity of a conserved quantity (e.g. energy, mass, money, 
messages, etc) transferred between a system and its surroundings. The second is in the 
elucidation of relationships among the equilibrium properties of a system. We can derive 
these relationships by isolating the flows (heat, work, etc) dealing with reversible 
processes, and we can derive general relationships among equilibrium properties. These 
are no longer limited to the special kind of process initially used for the derivation. The 
properties are called state functions . (Abbott 1989 p59). 

6. Stochastic Model and Markov Chains 

There were early efforts to say something about uncertainty over long sequences 
of words, word pairs and phrasings (Shannon 1948), (Mandelbrot 1953). We can assume 
that the tenns used in the technology development and as published represents an analog 
to a piece of continuous prose, which is being written. Consider a book that is being 
written, and that it has reached a length of k words. We can designate the number of 
different words (later we refer to these as terms) that have occurred exactly i times in the 
first k words as f(i,k), or in the notation of equation (2.8) C ' k . That is, if there are 407 
words that occurred exactly once each, then f(l,k) =407. We have an assumption that the 
probability that the (k+1 )-st word is a word that has already appeared exactly i times is 
proportional to i f(i,k), that is the total number of occurrences of all the words that 
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appeared exactly i times (Simon 1955). Simon also addresses the addition of terms and 
the disuse of terms. To this researcher’s knowledge, he makes the connection to a 
stochastic process for the first time in the literature. Using terms as symbols, 
representing technology, we see that a there is a weaker assumption than the probability 
of a particular word occurs next would be proportional to the number of previous 
occurrence. Also, as Simon and Shannon (Shannon 1948) did, we can make an 
assumption that there is a constant probability that the (k+i)' st word be a new word - a 
word that has not occurred in the first k words. This describes a stochastic process in 
which the probability that a particular word as the one to be written depends on the words 
that have been written previously. This is fine if the number of words in the vocabulary 
is roughly constant or the rate of change in the terms being added or dying in a language 
is not significant. In English, for example, this birth/death of terms is small relative to 
the language. 

For this technology transition study, we don’t expect the terms relating to a 
technology to be constant. There will be new words added and some will die. Simon 
worked through this by assuming if one representative of a particular term is dropped, 
then all of the representatives of the term are dropped. He also made the assumption that 
the probability that the next term that is dropped will be equal to the probability of one 
with exactly the same number of representatives of one with the same relative 
frequencies (Simon 1955). 

This result proves satisfactory for language analysis, and we now have a 
stationary condition to enable use of a chain of transitions. However, it does not quite 
work for a limited vocabulary, artificial language, as is seen in technology transition. It 
will be useful to define a model, which conserves a quantity, i.e. as a property decreases 
there is a change in another component, which increases. For example when two masses 
collide, there is a correlation of velocity, one increases, the other decreases until there is a 
mutual correlation of the shared quantity (in this case energy which is a function of 
velocity, hence velocity). We shall see that this mutual infonnation and the conditional 
probability of messages and tenns give us the quantity that enables the conservation 
principle for the studied models. 
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The chain of transitions is useful however. Stochastic processes of this type are 
known mathematically as Markov processes, and are extensively studied. In a Markov 
process the future evolution of a state depends only on the present state. There is a group 
of Markov properties of significance to information and communication theory. These 
are the egrodic (see 2. Ergodic Process, p273) processes, which simply stated says that 
every sequence produced by the process has the same statistical properties. So the letter, 
word, tenn or phrase frequencies obtained from particular sequences, will approach 
definite limits as the length increases independent of the particular sequence. The 
ergodic property means statistical homogeneity (Shannon 1948). The limits, provided by 
ergodicity pennits us to establish a maximum, a reference datum that can be compared to. 

In the study of technology, we would expect that a researcher, or publisher of a 
message, will use a process of associations using tenns they have previously been written 
by sampling earlier segments of the term sequences they previously wrote. We would 
also expect that, there is a process of imitation, that is, sampling segments of terms from 
other researchers, and from terms heard. 

Consider that the lens we put on the technology yields terms in a slice, of a length, 
of the entire sequence of terms in the technology’s artificial language. We can deal with 
this as a control volume. A control volume, establishes boundaries, here it is a slice of 
the language, that represent the system under study. There will be further discussion of 
the control volume in Chapter III (see p211). What is required, and addressed, in this 
dissertation is a way to address the addition of terms across the control volume 
boundaries, and mixing within the control volume. Mandelbrot (Mandelbrot 1953) gave 
us the first hint of what will lead to a dynamical solution. 


7. Information-Communications Theory, Statistical Mechanics 

Insightful developments in information-communication theory (Shannon 1948 
and Jaynes 1957, 1957a) help bring together statistical mechanics used in physical 
systems typically used to "describe the dice" (i.e. the physical description) and "taking 
the best guess" (the gambling theory part). Miller (Miller 1956), Zipf (Zipf 1949), and 
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Simon (Simon 1955) tie together infonnation theory, learning and skewed distribution 
respectively. For example, the 1 st through 3 rd laws of thermodynamics "help describe the 
dice", while the zero th law, as well as Boltzman's (canonical) and Gibbs (grand canonical) 
(Gibbs factors are dice independent tools of statistical inference) (Fraundorf 2000) 
(Schroeder 2000). 

The 1 st law is the principle of conservation of energy. This deals with the quantity 
to be conserved. 

The 2 nd law deals with entropy. It says that the entropy change of any system and 
its surroundings, considered together, is positive and approaches zero for any process 
which approaches reversibility. The second law addresses the quality of the property 
being conserved. It can also be shown that the spontaneous flow of a conserved quantity 
stops when it is at or very near its most likely microstate, that is the maximum entropy 
state (Schroeder 2000 p59), or in equilibrium with another system. The second law can 
also be viewed as a very strong statement about probabilities. 

While it may be initially troubling to the software community to have to think 
about physical properties (software has no physical properties, weight, temperature etc.), 
we can link the constructs of logical and physical space through entropy. Kolmogorov 
(Kolmogorov 1956, 1965) defined and showed various approaches to quantitative 
definition of information. Li (Li 1997) illustrated applications of Kolmogorov 
Complexity. Uspensky (Uspensky 1992) addresses the relationship of entropy and 
varieties of Kolmogorov's complexity 12 . Fanner (1983) showed the relationships of 
dynamical systems, information measures and dimensions and entropy. Prigogine li nk s 
irreversibility, dynamical systems and entropy in distributions as inputs, verses single 
point trajectories. If we can take advantage of this body of knowledge we, as software 

12 Kolmogorov’s complexity is related to Shannon’s entropy, and the notion of randomness. The main 
idea was developed by Bernoulli in 1713 where he stated that an experiment (recognize that this is what we 
do in technology transfer or evolutionary development) with probability of success p is repeated n times, 
then the proportion of successful outcomes will approach p for large numbers (Li 1993, p. 55). Bayes put 
particular definition on the term probability as the measure that an expectation depending of the truth of 
any past fact or the happening of any future event so that the more valuable as the fact is likely to be true, 
or the event is more likely to happen (Bayes 1763 Barnard 1958, p. 298). He also suggested the “inverse of 
Bernoulli’s problem”. Laplace (Li 1993, p. 46), further analyzed the inverse probability as is is also known 
and referenced Bayes in his discussion but this could not could not be developed by Laplace at the time. 
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engineers are provided huge leverage in lexicon, theory, and analysis. This ultimately 
provides the potential for accelerating software technology transfer and keeping the 
evolutionary development process under intellectual control. 


8. Quantitative Zeroth Law 

Let's assume a model of the quantitative version of the zeroth law cited above. 
Here is a theorem of statistical inference not involving energy at all. It applies also to 
thermally unequilibrated systems sharing other conserved quantities provided the only 
prior information we have is how the multiplicity of ways that a quantity can be 
distributed depends on the amount of that conserved quantity to begin with. Since this 
abstraction relationship relies only on the probabilities of the encompassing state 
property, we have a property that depends on the conserved quantity. 


9. Entropy 

Entropy as a concept can readily be seen as logical entropy (think of it as a 
measure of uncertainty, noise, non-signal, process inefficiencies, the percentage of work 
resulting in defects and requiring rework, etc) and physical or thermodynamic entropy 
(i.e. mixed-up-ed-ness, disorder, disorganization, etc), which is the quantity of energy not 
available to do work. Logical entropy is Shannon's entropy (S H ) as defined by Shannon 
on his treatise on communication theory (Shannon 1948). Shannon’s theory says that the 
entropy of an information source measures how well its behavior (e.g. the next symbol in 
a sequence it produced) can be predicted. 

Mixing entropy can be represented by the eigenvalue of a bakers’ transfonnation 
function. This baker transformation in state space represents entropy in terms of 
folding, stretching, translation and rotation (Spiegel 1998 p292). This transfonnation is 
the representation of a dissipative structure. These are structures with an innate capacity 
to dissipate anything that comes in to disturb the system. The term “dissipate” is 
somewhat unfortunate, because what really occurs is integration not dissipation 
(O’Murchu 1997 p.168). The entropy is the quantity of infonnation not available to help 
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us work, yet is valuable to understand if the objective is propagation and diffusion. The 
relationships are developed in Chapter III. 

Recently a number of undergraduate texts are illustrating entropy as the accessible 
state multiplicity for quantities that must be conserved — e.g. volume, and particles. The 
notion of conservation of a quantity is important to this research, as this could be 
momentum or more importantly information. This is understood from the logical- 
mathematical interpretation of the equations vs. physical interpretations. It requires us to 
step back and look at conserved quantities in the mathematical sense, then map those to 
our problem. Further, entropy, temperature or coldness ( 1/T) and heat capacity have been 
developed on the basis of infonnation units alone (Fraundorf 2000). 

10. Learning Curves 

We can associate efficiency with how well we automate the process of acquiring 
knowledge. Learning provides leverage and yields efficiency. When we get efficient, we 
free up cognitive capacity, which in turn pennits future learning. A large number of 
papers have examined, and reviewed the notion of a learning curve. As early as 1919 
Thurstone (Thurstone 1919) considered logistic, exponential and hyperbolic functions. 
The log-log form was dismissed by Mazur and Haste (Mazur 1978), but Newell and 
Rosenbloom did extensive analysis and examined the theoretical basis of the power law. 
They showed that power law learning is like exponential learning when examined in 
terms of the local rate of learning. Newell and Rosenbloom (Newell 1981 p2) state 
“There exists a ubiquitous quantitative law of practice. It appears to follow a [what they 
call] a power law, that is plotting the logarithm of time to perform a task against the 
logarithm of the number of trials always yields a straight line more or less.” They refer to 
this as the log-log linear learning law or the power law of practice. They also developed 
a form of the power law to deal with spans of patterns, which appears to take a fonn that 
may be very relevant to follow-on research. This chunking fonn of the power law 
learning is suggestive. There could be a relationship to the models developed in this text. 
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11. Abstraction 


The entropy discussion so far only gives us a logical-mathematical tool set and 
framework. There are some other aspects that need to be addressed. One that is still 
floating around from Plato's vignette about Meno’s learning is the notion loosely referred 
to as “unpacking”. Can this be tied back to entropy and communications as well? This 
seems to suggest a terms-of-reference and an abstraction requirement to minimize the 
effort related to understanding the“encryption” and protocol needed to communicate the 
ideas in this research. 

The user simply needs to know how to use the product, i.e. product-use and 
process-use knowledge. For example, the general population only needs to know how to 
start a car, drive a car (after training), know the reason for fuel, fuel a car and observe 
faults. Concrete acts requiring little to no thinking to communicate messages require 
little additional processing steps, and hence the least uncertainty or opportunity to add 
noise to the signal. A way to look at this is to create a set of nodes representing states in 
a hierarchy. In some models these can be hierarchical or collector states. 

This representation of states as nodes in the dimension of depth of knowledge is 
not new. In writing Principia Mathematica, Russell and Whitehead (Whitehead 1910) 
were forced to construct a hierarchy of types that would permit logical statements to refer 
to other logical statements. In their theory, a proposition could take the place of a 
variable if it were interpreted as being on a lower level than the meta-statement 
proposition. This relationship of logical hierarchical structures is very powerful here in 
terms of representing the depth of abstraction. 

This is useful in the development of a similar approach that is adopted in order to 
apply the entropy concept. The application of the entropy notion based on hierarchical 
states permits use of the same units required for statistical inference techniques. 

We now have access to a common dimension in the area of abstraction, 
uncertainty, and communication as well as temperature for the development of the theory 
and model for software technology transition, and evolutionary software development 
(Luqi 1989, Luqi 1991). It also leans in the direction required to represent software 


-91 - 



applications (Berzins 1991). We have an entropy metric that works at a higher level of 
abstraction than is afforded by the system/engine/machine node interpretation. While we 
are counting every message, and structural combination in this research for experimental 
purposes, in actual practice we will be able to take samples. We no longer need to resort 
to counting every single particle or message or structure. Abstraction can also be useful 
when mapped to a scale to represent learning and competency for an individual or 
organization. Abstract representations in terms of combinations of terms at higher q- 
levels minimizes the effort required communicate. This means we can unpack a message 
easier and enables reliable and efficient processing of messages. 

The hierarchy of types of technology transition, or evolutionary software 
development and/or software says: 

For any selected state of a node, a lower level state diagram may be substituted. 

The proposition implies that it is not necessary to know state at any level of the 
diagrams, but only their relative levels. This proposition requires that no state at any 
level is the same state as one on a higher or lower level. As with Russell and Whitehead's 
hierarchy, a state has only meaning in context. 


The resulting axiom of reducibility to Whitehead's hierarchy is as follows: 

The static relationships between states are not changed by the 
presence of sub-states. 

In other words, the static probability of a state being active is not changed by the 
presence of its internal states. An important result is that the internal states have 
conditional probabilities, which rely on the probabilities of the encompassing state. 
(Grable 1994) While this research does not need to develop this further, all of the tools 
are available as a result of this research to do further analysis using conditional 
probability and Whitehead’s hierarchy of states. In Chapter III, the impact of abstraction 
can be seen in amount of complexity in our representations results in increased 
understanding. 
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D. RELATION TO TECHNOLOGY TRANSFER 


We can now start to see some of the elements that will constitute the technology 
transition model. It is clear we need to reflect the human in the process. The technology 
transfer literature is heavy with the focus on human learning. Uncertainty reduction is 
achieved through learning and the execution of informational activities (communications 
of a message of some sort) and the notion of irreversible combinatorial interactions and 
mixing (again in this context, a combination of input signals by some process and 
generating an output.) These are at a minimum, probabilistic, involving individuals that 
reduce uncertainty by performing an informational activity in the form of learning. 
Chance also plays a role. Fundamental to these ideas of learning and chance is 
communication. Both of these activities can be represented in terms of probabilities. 

1. Leverage of Terms of Reference 

The ability to bridge these two previously disconnected views of a physical and 
non-physical world conveniently provides powerful analytical tools to the software 
engineer. This is a nontrivial contribution to the software engineering community, we 
can put methods in the hands of software engineers that can be readily grasped by the 
mechanical, electrical, or communication engineer or anyone who has had some basic 
physics. This reduces the barriers to use by lowering the effort required to unpack, 
decipher and understand the protocol for the user community. 

2. Software Technology Transfer and Evolutionary Development 

This research makes an initial suggestion that software development, especially 
an unprecedented system development using an evolutionary, risk reductive approach, is 
very similar to the process of software technology transfer. This process is one of 
discovery, maturing thoughts on the application, fusing existing domain knowledge, and 
advancing the particular body of knowledge represented in the software product. The 
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body of knowledge advances when the prototypes, demo units and final product are 
delivered to the user community. 

These two classes of processes, technology transfer and the evolutionary or spiral 
development model (MIL-STD-498 and Boehm 1988), are heavily laden with 
probability, and are primarily driven by external factors and the large proportion of 
human activity. We shall develop those points throughout this discourse on software 
technology transfer, and point out the analogs in the software development process, 
specifically in the case of evolutionary development. 

The research suggests that these two cases are related. Similarly, the development 
of the theory will always keep an eye to an interesting challenge — will the theory hold 
for software development and possibly to — software itself. If this holds, we may very 
well have the first in-road into the development of what this researcher calls — Software 
Physics. This research leaves until the end the speculation that software, a process itself, 
albeit a deterministic and predictable process, is similar in nature to the technology 
transfer and spiral development process with all of the uncertainty reduced or 
degenerated out of the framework. In the later sections of this discourse, these 
relationships will be more fully developed. 
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III. METHOD AND MODEL 


A. METHOD AND MODEL DEVELOPMENT- FUNDAMENTALS 


This chapter will review information theory fundamentals required to develop the 
various entropy models. A macro level basic entropy model is developed showing the 
trends of entropy Sh vs. time step k. Then a closed system consisting of two interacting 
subsystems is discussed. Here we show the relationship between extensive and intensive 
quantities. This permits developing a state equation relationship between properties. 

A one dimensional state space representation in the form of a dynamical map is 
developed. The data is related using the one dimensional dynamical equation 
S H = F(S h ), where S H is the input and S H is the output entropy at the macro 

level. The significance of stability, and the Lyapunov number for such a dynamical 
system is discussed. A two-dimensional finite difference map is introduced. 


5 


H 


k +1 


N- 


l k +1 


=ns Hi ,N lt ) 

=G <VN) 


(3.1) 


Where N: is the number of messages at time step k. The subscript i is indicative 

of a perfonnance band. Perfonnance bands pennit the partitioning of the community into 
groups of organizational nodes which possess statistically similar characteristics. A 
possibility is to put all of the organization nodes and their associated authors that produce 
within +/- lo of the mean number of messages per time step together, and +2o 
performing organizations together, and +3o perfonning organizations together. One 
could follow the development of Boltzmann and subdivide the population into ever 
decreasing size bins. We do, however, have to be careful not to reduce the bin size too 
small. If it is too small the statistical significance of the bin contents is lost and the 
probability distribution inside the bin will reduce to a single message trajectory. 
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Finally, a feedback model is introduced. Here a dynamical system of equations 
models introduction of new information, and the understanding of previously existing 
information. This is considered at the organizational node level of interaction. The 
eigenvalue of the feedback model also represents and entropy. We will see that a tuning 
parameter permits closely aligning the dynamical system model trajectory toward 
stability over time with the macro level information theoretic model. This tuning 
parameter might be viewed as relating to the learning rate. 


B. INFORMATION THEORY - SHANNON’S ENTROPY 

Informally, information measurement can be understood as anything that 
increases the variance also increases the information. Generally, variance is usually 
stated in units of measure, e.g. meters, volts, etc. The amount of infonnation is a 
dimensionless quantity. When we have a large variance, we are very ignorant about what 
is going to happen. If we are very ignorant, then when we make an observation, it gives 
us a lot of information. On the other hand, if the variance is small, we know in advance 
of our observation how the result is likely to come out; hence, we get little information 
from making the observation. 

Shannon (Shannon 1948) best explained entropy in a theory that assigns a 
quantity of information to an ensemble of possible messages. All messages in the 
ensemble being equally probable, this quantity is the number of bits needed to count all 
possibilities. This says that each message in the ensemble can be communicated using 
this number of bits. However, it does not say anything about the number of bits needed 
to convey any message in the ensemble. So this approach can be reasonably related to a 
technology message. It could be simple and count as a message or as theory in a paper or 
demonstration. 

Shannon is interested in the problem of communicating a message between a 
sender and receiver under the assumption that the universe of possible messages is 
known between the sender and receiver. (Li 1993, p. 61). 
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Technology maturation feels intuitively to be the stabilization of knowledge, 
based on prior information communicated in messages about a problem to solve. As with 
long run empirical evidence of dice throws, in gambling houses, or death statistics, in 
insurance companies, technology maturation similarly suggests that random frequencies, 
are apparently convergent. But it is clear that no empirical evidence can be given for the 
existence of a definite limit for the relative frequency. Yet the Bayesian approach 
quantifies the intuition that if the number of trials n is small then the inferred distribution 
(the future prediction) depends heavily on the prior distribution. However, if the number 
of trials is large, then irrespective of the prior distribution, the inferred probability 
condenses more and more around p. 

Now suppose we have a technology we wish to implement - a problem to solve. 
If there is previously a lot of experience, then we either know exactly how to solve the 
problem, or we know the frequency of success for different possible methods. However, 
if the problem has never occurred before, or a limited number of times, the prior 
distribution is unknown or of limited value. Solomonff proposed a universal prior 
probability. The idea is that the universal probability serves as well as the true prior 
probability. In reality, we may not have a “prior” which is known for a technology. So 
we can define a start point as the probability that a fixed reference Turning machine 
outputs a sequence starting with x when the input is a fair toss of a coin (Li 1993, p. 58). 
In other words, we can start anywhere. Over a time, sequences and sets of sequences will 
develop. Almost all infinite strings (sets of sequences, i.e. messages) are irregular and 
satisfy all of the regularities of stochastic randomness. 

Shannon does not capture the information content of the individual object 
(message) between a sender and receiver. He recognizes that “messages have meaning 
[... however ... ] the semantic aspects of communication are irrelevant to the engineering 
problem” of communication between a sender and receiver. (Shannon 1948) 
Kolmogorov’s algorithmic complexity is a measure of the infonnation content of the 
individual object (message). (Li 1993 p61) He shows that the complexity measure is 
related to the length of a message and prefix. 
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1. Entropy Review 


The definition of entropy here is related to the definition of entropy in 
thermodynamics. Appendix A. Infonnation Theory, p274 provides a basic review 
of entropy in infonnation theory after Shannon, Jaynes, Kolmogorov, Uspenski, and 
others as found in Li, (Li 1993) and Cover (Cover 1991). The basic entropy equations in 
this section, and the next three sections on maximum, joint, conditional and relative 
entropy follow closely to the development by Cover (Cover 1991). The basic probability 
relationships on which the entropy relations are built can be clearly seen in Bayes original 
work however (Bayes 1763). 

Let X be a discrete random variable with alphabet E and a probability mass 
function p(x)=Vr{X=x}, xeE. p(x) and p(y) refer to two different random variables and 
are in fact two different probability mass functions p x (x) and p y (y). For the alphabet, with 
the given probability mass function, the definition of infonnation entropy is: 

S H (X) = -^p(x)hg 2 p(x) (3.2) 

xeE 

S H is the entropy measured in bits, and the log is base 2. Log 2 will be assumed 
throughout unless otherwise noted. 

The base of the log is two for the natural units of infonnation entropy as 
developed by Shannon (Shannon 1948). The entropy is a function of the distribution of 
X. It does not depend on the actual values taken by the random variable X, but only on 
the probabilities. 

If X~p(x) which means that the probability of use the random variable is 
representative of the element’s usage over the alphabet, then the expected value £ of a 
random variable g(X) is denoted 

E pM g(X) = y £g(x)p(x) (3.3) 

The entropy of a plain random variable X can be interpreted as the expected value 

of log—-— , where X is drawn according to the probability mass function p(x). Thus 
P(X) 
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(3.4) 


E P(x) lo § ~7V T = X lo § "TT P(*) = - Z 1o § pWpW = s h 

p(X) ^ p(x) ^ 

2. Maximum Entropy - Equal Probabilities 


Here is an example. Let have a system where there are only two choices, 
f1 with probability /; 

X =1 (3.5) 

[0 with probability 1 - p 

then 

S h ( ■ x ) = ~P log P ~ (1 - P) l°g( 1 - P) = S H (p) (3.6) 

We see that Sh = 1 bit when p=l/2. Figure III-1 shows the basic properties of 
entropy. It is a concave function of the distribution and equals 0 when p=0 or 1. This 
makes sense because when p=0 or 1, the variable is not random and there is no 
uncertainty. The entropy is maximum when p=.5, which corresponds to the maximum 
value of the entropy. 
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Entropy vs Probability 


S H 


Eriltt; riC-jJj llj 



Entropy S H 

S H = -Z p(x) log 2 p(x) 

= -(p) tog p - (l-p) log (1-p) 


Expected value 

W 3(X)= I^g(x)P(x) 

E pm '° 9 p(X) =S H 


Figure III-1 Entropy vs. Probability 


Consider a system where input signals XeJ. Specifically, where X is a set of terms, 

T = | term 1 

\ \ ( 3 - 7 ) 

2 ={msg\ 

Where 2 is a set of all the subsets, often called the power set. Here is an 
example. 


T={A, B, C, D} 
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{}. 

¥=} {A,B},{A,C},{A,D},{B,C},{B,D},{C,D}, L (3.8) 

{A,B,C}, {A,B,D}, (B,C,D), {A,C,D}, 

{A,B,C,D} 

v. y 

Now when the number of elements in |x| =4, we get 2^ =2^ =16. Note also 
the distribution of sets. We have one null set, {}. We have four sets of singles. We 
have six sets of pairs. There are four sets of triples and finally one set of quadruples. 
Each of these are referred to as a q-level. 

The maximum entropy occurs when we have an equal distribution of tenns. So, 
for a message set where each subset of terms appears only once we define S H as 

- ~~ X AT 

xe2 “ z ^ 

The entropy maximum is at l/p(X) or |t|, or the number of sets of terms in the 
alphabet I. In Figure III-2, we see the effect of sets of terms that are evenly distributed. 
In our model, we would not expect to see ,5< p(X) <1 as the result of integer number of 
sets of terms. This is because when we make decisions between two choices, one set of 
terms and another set of terms (an integer quantity), that yields a probability of .5. If we 
have one choice, one set of terms, we are certain of the answer, and the probability is 1/1 
or by definition Sh= 0. 
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Entropy (bits) 


Maximum Entropy 


Entropy vs 1/ |x| i.e.or p(X) 



Figure III-2 Even distribution of terms, yields maximum entropy 


The example vocabulary above, with an alphabet of |4| has a distribution of sets as 
seen in Figure III-3. 
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Set of Sets Distribution 



Alphabets of |4| or |8| are tractable. Combinations available for |32| are already 
intractable. In Figure III-4 we have taken the log of the frequency plotted as a function of 
the combinations available in any q-level (sets of singles, doubles, triples, n-tuples). This 
illustrates how quickly the combination of sets grows, hence the probability of selecting a 
set is reduced. 
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Set of Sets Distribution 



"q levels" 

Ph.D. Defense 2001 


Combinatorics_entropy.xls 


Figure III-4 Distribution of sets of sets (combinations) in an alphabet 


It is appropriate to consider additional possible states that can occur. This would 
include pairs of terms, and triples, etc, until the sets of sets of terms are exhausted. 
Recognize q-levels are containing sets of subsets of |q| lengths. Their distribution 
indicates the most probable available states, q-level contents have distributions and 
different “weights” 
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q level sets distribution 

“weighf 

q=0 

{} 

1 

0=0*1 

q=i 

{A}, {B}, {C},{D} 

4 

4=1*4 

q=2 

{{A} {B}}, {{A}{C}}, {{A}{D}}, 




{{B}{C}},{{B}{D}},{{C}{D}} 

6 

12=2*6 

q=3 

{{A}{B}{C}}, {{A}{B}{D}}, 




{{B}{C}{D}},{{C}{D}{A}} 

4 

12=3*4 

q=4 

{{A}{B}{C}{D}} 

1 

4=4*1 


The “weight” of a set in q 4 > qi e.g. {{A}{B}{C}{D}} 4 > {A}i. Weight of 
the level is product of the level, tells us how many terms were combined in a subset, and 
the number of sets in the level. We refer to Qi weight of the q=i level. The weight of all 
of the levels summed is Q c . Every one of these sets of sets is considered a message in our 
models. TO move a message from one q-level state to another requires some stimuli. 
We interpret this in the same way that Newton laid out his second law. 


Distribution of Combinatorial sets of terms 



Dec 2001 


q-levels" 

Ada_Affiliation (month)_entropyGraphsB.xls 

M Saboe 48 

Ph.D. Defense 2001 


Figure III-5 Distribution of Combinatorial sets of terms 
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Technology sample sets would never have message sequences that are infinitely 
long. We are always only looking at a subset of the infinite set of sequences. They are 
limited by the view we take through a record identifier, an abstract, article or other work 
product. Ultimately, in the real world, the window message length is limited. 

Technology samples have alphabets on the order of |1024| or more. The 
probability of pulling a set out of the sample alphabets of |4| and |32| are shown in Figure 
III-4 and Figure III-5. It would take a VERY, VERY large number, but not an infinite 
number, of messages (sets of sets) to reach maximum entropy when all of the terms are 
equally distributed. 

Maximum entropy is a mathematical construct that defines equilibrium. It is 
similar to absolute zero in temperature of a physical system. It is a practical sense, it 
really not attainable in reasonable time scales for natural events. In a physical system, at 
absolute zero, we have minimum change in energy. 

We expect that in our sample, relevant terms will be used increasingly. This will 
always skew the distribution to the left, to lower q-levels. We are never likely to get an 
equal distribution of tenns, but in principle, it could happen. 

This means that the theoretical maximum entropy is never reached in reasonable 
time. The maximum entropy concept is useful only as something we use to compare 
with. This implies we need the mechanism to determine relative entropy. 

We consider each of these subsets, the primitive messages in this research. We 
get the count of all of the permutations for triples, and quadruples, etc. These determined 
composite sets of sets message data points in each technology sample. The total count of 
all of the terms found in a time step is used to determine the maximum entropy. 

Let’s now introduce the definitions for joint and conditional entropy and mutual 
information. These are key facets of the technology transfer models proposed. 
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3. 


Joint Entropy 


Joint entropy S(X,Y) of a pair of discrete random variables (X, Y) with a joint 
distribution (X, Y) can be considered to be a single vector-valued random variable. The 
joint probability p(X,Y) be defined as p(x,y) is the probability of a joint occurrence of 
event X=x and event Y-y. This leads to 


S H {X,Y) = -Y i YjP^y^> lo § P( x ’ >’) ( 3 • 1 °) 

xeZ ye 1 ? 

which can also be expressed as 

S H (X,Y) = -E logp(X,Y) (3.11) 

4. Conditional Entropy 

The conditional entropy of a random variable given another is defined as the 
expected value of the entropies of the conditional distributions, averaged over the 
conditioning random variable. If (X,Y)~p(x,y), the conditional probability is p(X\ Y) of 
outcome X=x given outcome Y=y for random variables (not necessarily independent). 
The conditional entropy Sh(Y\X) is 

S H (Y\X) = '£p(x)S H (Y\X = x) (3.12) 

xeE 

= —E p ( X , y ) log P (Y\X) (3.13) 

This is shown in the Venn diagram in Figure III-6. The mutual information is 
given as I(X;Y). 
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Mutual Information and Entropy 


(Condition af> 

l(X;Y) = S H (X)- S h (XIY)^ -^) 
l(Y;X) = S h (Y)- S h (Y/X) (2) 



Figure III-6 Mutual Information, Joint and Conditional Entropy 


Referring to Figure III-6 for the models proposed, the entropy of the vocabulary 
of terms at time step k is the input entropy S H (X). The joint entropy Sh(X,Y) is the 
cumulative entropy at time step k+1. The S/i( Y) is the incremental contribution of the 
time step k+1. The mutual information, I(X;Y), can be calculated from equation (3) in 
Figure III-6, given the data for the input entropy, the incremental contribution, and the 
joint entropy. Using Figure III-6, equations (2) and (3), the conditional entropy is readily 
computed. 

5 H (Z) + 5 ff (F)-5 H (Z,F) = 5 H (F)-5 H (F|Z) (3.14) 

Notice how Sh is dropped from the equation as we rearrange and get 
S w (X,F)-S tf (X) = S H (F|2Q (3.15) 

$k+l S k ASk+l 

I___I I_I I_I 

Joint Input Incremental 

new information 
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The joint infonnation is the cumulative entropy computed at time step k+1. This 
will also be the input to the next time step. The input is the pool of information 
(persistent messages with their constituent terms) available to the producer. On the right 
hand side of the equal sign, is the incremental addition of new infonnation. 

Recall from Chapter 2, that this is the feature that makes the infonnation model, 
that includes a social system, different from a thennodynamic system. In a 
thennodynamic system with physical particles, the important feature of stochastic 
dynamics is the local, short-range character of the interactions. In the physical system, 
the number of transactions going on per unit time in a system of size N must be 
proportional to the size. That is each element can only sense its neighbors. In this 
system, which includes nodes of people and machines constituting a social system, this 
local property has to be redefined. Local is not geographically local as in a volume, but 
rather the volume is defined as accessible by a direct contact via a graph. Each element 
can simultaneously sense all of the other elements present and reachable. The studies, by 
Allen (Allen 1977, 1983), on influences from external sources is amplified. This leads to 
transition rates proportional to N a , where the exponent a may be larger than unity. 


5. Relative Entropy 

Relative entropy or the Kullback Leibler distance between two probability masses 
p(x) and q(x) is defined as 


D(p\\q) = ^p(x)\og 

xeZ 


P(x) 


(3.16) 


= E P log 


P(X) 

q(X) 


(3.17) 


Similar to earlier developments, we use the convention based on continuity of 


0 p 

arguments that 0 log — = 0 and p log — = °° . 


(Cover 1991, pi8) 


While it is not a true distance between distributions, it is useful to think of relative 


entropy as a “distance” between distributions. The mutual infonnation which was 

- 109 - 



introduced before is the measure of the amount of infonnation that one random variable 
contains about another random variable. It is the reduction in the uncertainty of one 
random variable due to the knowledge of the other. Assume we have two random 
variables X, and Y with a joint probability mass function p(x,y) and marginal probability 
mass functions p(x) and p(y). The mutual information I(X;Y) is the relative entropy 
between the joint distribution and the product distribution p(x)p(y), i.e., 


i(X;Y) = £ E pU, y) log p(x ' y \ 

.veH .ve>P P(x)p(y) 

= D(p(x, y) || p(x)p(y) 


= E 




log 


P(X,Y) 


(3.18) 

(3.19) 

(3.20) 


P(X)p(Y) 

It is important to see that the mutual information I(X;Y)=I(Y;X) 

1(X;Y) = S h (X)-S h (X\Y) (3.21) 

The mutual infonnation I(X;Y) is the reduction in uncertainty of X due to 
knowledge of Y. By symmetry, it follows that 


/(F;X) = /(X;F) = S H (F)-S H (F|X) 


(3.22) 


That is X says as much about F as F says about X. Since 
S h (X,Y) = 5 h (A) + 5 w (F | X) we have 


7(X;F) = 5 h (X) + 5 h (F)-5 h (X,F) 


Also we see that 


I(X;X) = S H (X)-S H (X\X) = S H (X) 


(3.23) 


(3.24) 


The mutual infonnation of a random variable with itself is the entropy of the 
random variable. 
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Mutual information and the symmetry we see here is what will enable the 
conservation principle to be met. As X correlates with Y it is realized in the same amount 
of mutual infonnation. This is easy to see in Figure III-6. 

6. Message Counting and Message content - terms 

The message counting model, seen in Figure III-7, which is typically used, 
provides a very good correlation and is quite linear with time. This may not always be 
the case, but extensive studies on this data clearly showed that the linear fit was best for 
messages. Often, studies in the literature acknowledge that the linear fit only works after 
the initial slow ramp up phases. Once the initial transient is over, and the system 
achieves a quasi steady state, the linear fit works well. 

Possibly, infonnation theoretic and dynamical systems models can be built that 
enable richer analysis. The relationships to be developed should ideally be independent 
of the diffusion rate’s function form, linear, power, polynomial, etc. While the 
explanation is done here for the linear model of message change over time, the general 
approach is developed mathematically independent of the functional form of the message 
rate equation. In this way, the technology under examination diffusion rate can dictate 
the fonn of the function. It turns out that linear, power or polynomial (low order) fits of 
the message verses time step function, all work out to be rather well behaved, and 
solvable in a closed fonn. 
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Traditional Model - Message-Counting 


Traditional Method -- Count the Messages 
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Figure III-7 Message Counting Linear model 


For an information - communication model to work we need to determine the 
change in entropy over a time step. In Figure III-8, we see how entropy and messages N 
vary over time. Messages are a conserved extensive quantity, and the information 
entropy Sh is related to the quality of message content. The count of terms making up the 
messages N will be indicated by n. 
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Figure III-8 Entropy and messages N over time 

In Figure III-8, we see that we would like an illustration of the joint entropy 
related to technology at a given time step. Further, we would like a method to compare 
to different technologies, Figure III-9. This is done through the mechanism of relative 
entropy. 

Figure III-9 illustrates two technologies. Using relative entropy, we now have a 
mechanism to determine how “close” these technologies are in a crude sense. But, there 
are other factors are work. For example, what is the mind share, the volume of nodes 
operating on the messages? 
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Experiment 2 

Cumulative Entropy vs. Year 

Java 2813 Terms, 28907 Instances, 5330 Messages, 6 Years 



k (Years) 

Figure III-9 Entropy vs time 

7. Interacting Subsystems 

Let’s imagine a super system (the community’s world of knowledge) that consists 
of two subsystems. These subsystems represent what is known and what is unknown at a 
given time. The sum of the two subsystem’s extensive variables messages N, and nodes 
V is constant. Here the conserved extensive variable properties are N messages, and the 
sum of all the nodes, v, which is the volume V. This will define a control volume. The 
rate of change follows the rate we would expect if this were modeled as an open system 
during these time steps. Now we will take a virtual partition and have it progress 
expanding subsystem A to the right. As this partition passes over some nodes, effort is 
made by the nodes and they “discover” a term. Terms n are the internal pieces of the 
messages N. Terms are defined as primitive messages. Counting terms is similar to 
counting the messages, but at a finer granularity. The nodes stimulate and change the 
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internal configuration of the system by converting an undiscovered term (a null) into a 
communicated discovered term. 

This can be seen in Figure III-10. On the left hand side we see “!! !s” representing 
terms that have been discovered (answers), on the right hand side of the partition we see 
“???s” representing terms that are yet undiscovered (questions). The Venn diagrams 
indicate the subsystems A and B, joint, conditional entropies and mutual information, as 
illustrated earlier. Examine what this looks like with a sample alphabet as in Figure 
III-11. 

The nulls {} are tenns that have not yet been discovered at the frontier of the 
research in time. We might ask, if the null or “???” terms really exist and are 
representative of the real world. Researchers or any node that builds a work product or 
messages is actually working toward a yet unrealized collection of answers “!!!”. They 
envision the potential combination of terms (primitive sets) that can make a 
representation of the goal directed, objective work product that is desired. Certainly, 
during the period of time when a node is developing the answer, the term under question 
exists. Desires, although they are not representational states, do have an object, 
something they are a desire for. This is the “???” tenn. Desires 13 , like beliefs, are 
intentional states (Drestske 1988 pi30). The nulls represent the “???” questions desired 
by research. 

In this simplified example, we are assuming a fixed set of terms in the alphabet, 
and a fixed number of nodes. This will permit the development of the general 
relationships between extensive and intensive variables in a state equation. Later, once 
we have seen these relationships, we can start with an initial condition representing the 
number of terms and nodes known up to that time. Then we add more vocabulary to the 
system or more author nodes any way we wish. The rate of change, when reduced to per 
node, and per tenn (specific) extensive variable rate will be expected to remain for the 

13 Not all desires are realizable. Some desires inherit the referential opacity from the beliefs and other 
desires from which they are derived. “Desires, are like beliefs, referentially opaque. The belief that s is F 
is not the same as the belief that t is G, although s=t and although the predicate expressions “F” and “G”, 
are true of, or refer to exactly the same things.” (Drestke 1988 p 130). The same is true of an object desired. 
In the ancient Greek play of Sophocles, Oedipus wants to marry Iocasta, but does not want to marry his 
mother (and perhaps even wants not to marry his mother), despite the fact that Iocasta is his mother. 
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future (open system) similar to the rates for the historic (closed) subsystems. This 
permits the design of a desired solution in the form of an engine. 


Interacting Systems 
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Figure III-10 Interacting Systems A and B 

In Figure III-12 14 , we see that as system A expands, the number of terms 
discovered increases, at the same rate that the number of terms undiscovered decreases. 
This model satisfies our conservation principle for extensive quantities. 

Next, in Figure III-13, we examine the entropy relationship. The horizontal line 
at the top of the figure is the joint entropy of the system. Since this is a closed system, 
this is not changing, however, the internal distribution will change. That entropy related 
to subsystem A will increase as the are more and more choices to make in order to get 
complete information. Subsystem B will decrease from a high entropy (all of the 
unknown terms) to a lower entropy as there becomes less and less left to be discovered. 

I 4 The charts in this section represent initial data to illustrate the general relationships. Actual 
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The lower curve shows the mutual information. When the distance between the center of 
the two probability masses, or subsystems, decreases, there is a higher correlation. 
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Figure III-11 Subset of an alphabet in two interacting systems !!! and ??? 


Messages in Two Subsystems 

Interacting Systems A and B (Constant Messages in Total System AB) 



Figure III-12 Messages in two subsystems 


equations for a specific technology are shown in Chapter IV, and the appendix. 
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Entropy vs Messages 
Two Subsystems 


Entropy 2 Interacting Systems A and B 
S(X)_A = -5E-05n 2 + 0.0228n + 4.8832 



n Messages _ A 


Figure III-13 Entropy vs. Messages Two interacting Systems 


Following reasoning similar to that used in statistical, and condensed particle 
physics (Schroeder 2000) (Fraundorff 2000), we can find some useful relationships. The 
slope of the curves of the two subsystems gives us some important information about 
thermal equilibrium. Recall from the canonical ensemble discussion of free energy, that 
the temperature T is the parameter controlling free energy, or the conserved property. In 
this case of messages, we can write 


T An 


(3.25) 


So the temperature is related to slope of the change in entropy verses change in 
messages curves. When the curves in the figure cross over, the system is at an 
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equilibrium point. Let’s look at a general relationship that shows the increase in one 
system is related to the negative slope or, the decrease in the other. 


AS, 


A n. 


A S c 


An, 


(3.26) 


The incremental change in S A , divided by the change in n A messages, is equal to 
the change in entropy, Sb, for system B again compared to the change in the conserved 
quantity, in this case n A . Rewriting we get 


AS ±+ AS jl 
A n A A n A 


= 0 


(3.27) 


The second term has a B in the numerator and A in the denominator. An A is the 
same as -Ang, since what we discover in messages is the same as what is removed from 
the undiscovered system. We can rewrite this for a system at equilibrium as 


AS. _ A S B 
A n A A n B 


(3.28) 


The thing that is the same for both systems when they are at thennal equilibrium 
is the slope of the entropy message graph. This slope must somehow be related to the 
temperature of the system. The 2 nd law of thennodynamics tells us that the conserved 
property will tend to flow into the subsystem with the steeper entropy vs. message graph, 
and out of the object with the shallower entropy vs. message graph (Schroeder 2000 p87). 

According to Schroeder, the former “wants to” gain the free conserved property 
(messages) in order to increase its entropy. If there is an imbalance between the two 
subsystems, the latter doesn’t so much “mind” losing a few messages (since the entropy 
will not decrease much. A steep slope must correspond to a low temperature, while a 
shallow slope corresponds to a high temperature. 

Now we can see in the lower curve of Figure III-14, the relationship of the 
temperature (the right hand y-axis) of sub-system A as the partition moves over the time 
steps. More activity increases the temperature. The temperature is measured in degrees 
as we would in a physical system; however, these degrees are developed from 
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information units. This is “the” fundamental temperature unit developed from the 
relationship of entropy, and the conserved quantity. 

Note that there are temperature fluctuations. This is consistent with Prigogine’s 
observation about evolving systems. A dynamical system will help explain these 
fluctuations. 


Pressure and Temperature 
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Figure III-14 Pressure and Temperature °Saboe© vs. time - two interacting 

systems 


Pressure is defined as the <messages> processed per node, where the <messages> 
represent the average in the time step per node. The important observation is not 
necessarily the form of the equations or the goodness of fit, rather, that the pressure can 
be seen to increase as the temperature increases. While messages are not physical 
molecules as in a thermodynamic system, they seem to behave as a gas might, as the 
temperature goes up the pressure goes up. 
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Figure III-15 shows the relationship directly between pressure and temperature. 
This was developed by taking the curves from Figure III-14 and setting them both equals 
to k. Then the Pressure P(T) as a function of temperature is detennined. 


P(k) = m p + b p 

(3.29) 

P(k)~b p _ l 

(3.30) 

nip 


Similarly, solve for k as a function of T. 


T (k) = m T k + b T 

(3.31) 


T{k)-b T _ k 
m T 

Then we get 

P(k)-b p _T{k)-b T 
m p m r 


(3.32) 


(3.33) 


wi 

P(k) = -^(T(k)-b T ) + b P (3.34) 

m T 

When plotted in Figure III-15 is the tight set of points indicating as temperature 
increases, pressure increases. Figure III-15 shows the raw data points as well. These 
fluctuate around the P(T) calculated data, would be expected. 
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Pressure Temperature 

Pressure vs Temperature Saboe Degrees 



Figure III-15 Pressure vs. Temperature °Saboe © 

The application that was written to solve this relationship was also developed for 
the cases of power, and 2 nd order polynomial. In the application code written for this 
project all of the permutations, linear pressure as a function of time, and power 
temperature as a function of time, power pressure vs. time, polynomial temperature, etc. 
were developed. Future efforts will automatically pick the best fit for the technology 
under examination and develop the P(T) function from that. 

Typically, a state diagram viewed by engineers is a temperature - entropy, or T-S 
diagrams, (recall Figure 11-13). The lower curve of Figure III-16, the T-S is illustrated. 
This is the entropy of sub-system A with entropy (upper x axis) and temperature 
(secondary y axis on the right). Since this system was not engineered, we do not expect 
to see anything approaching isentropic expansion, or a constant pressure, temperature 
increase. 
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Figure III-16 Entropy — Messages, and Temperature - Entropy 

The figure also shows entropy of subsystem A (left Y axis) and messages n on the 
x axis. From this information in a closed system, we can see the trends for a given 
technology over time. In a way, we have the ability to define the heat capacity 15 (say C p , 
heat capacity at constant pressure, C v , heat capacity at constant volume, or the ratio of the 
C 

heat capacities, y =.—) in bits. This allows us to move to an open system, like an 

Cy 

engine, add nodes, volume, and increase message flow. We can then compute our effort 
required from a desired “engine” to develop a technology to arrive at a given time. 

AU=hC p AT (3.35) 


15 Heat capacity for sate equations are property relations and as such are independent of the type of 
process. C p is the amount of “stimuli” transferred to a system per unit “message” per unit degree rise 
during a constant pressure process. 
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This says the change in the “internal” system energy U is related to the message 
flow rate h (messages per time step), the heat capacity and the change in temperature 
from a high temperature to a low temperature. 

This also implies the equivalent of Carnot’s cycle, which can tell us the maximum 
efficiency we can expect. 

Since “internal” system energy U is introduced, let’s look at this a bit further. 
This is related to the internal structure distribution of the terms. The set of sets of terms, 
reduced to primitive message combinations follows a Boltzmann distribution, Figure 
ni-17. On the x axis, is the q-level, representing the number of terms in set. The lower 
curve on the y-axis is the frequency of sets. The upper curve assigns a weight to each set. 
The weight simply changes the quantity by a constant. We can ignore it for the purposes 
of these analyses. It is interesting to note, as well, that these curves plotted over the time 
steps examined (up to 21 years) essentially remain stationary (Figure III- 18). 

This change in q-levels (microstates) can be addressed by equations (2.9) and 
(2.10). This permits conjecture in the deeper meanings of the distribution of terms. 
Further, state transitions moving from one q-level to another, must somehow be affected 
by an impulsive stimuli of some sort. That implies both the notion of kinetic and 
potential “energy”. This is the result of stimuli of researchers expending effort to 
combine primitive terms and or sets, composing more sets of sets. “Discovering” new 
single terms, the first time a ??? augments the vocabulary also is the result of a change of 
state from a {}, null, to the first instance of an answer. !!!. This too takes effort. These 
topics are subject for future research. 
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Figure III-17 Boltzmann Distribution of Sets of Terms (primitive messages) 
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Ada Distribution of Messages by q-level 
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Figure III-18 Set of sets distribution over time steps by q-level 


8. Technology Transfer Channel Elements 

We consider two cases. The deterministic case represents the microscopic 
level of the model in the system, and the stochastic case represents the macroscopic 
system view. So far, we have only addressed the macroscopic case. The deterministic 
case would occur at the micro level in a program, or a system made up of nodes 
consisting of a family of machines. A stochastic system consists of a population, 
coarsely partitioned at the macroscopic level. This is a system made up of a social 
environment consisting of people and organizations. The TechTx models address the 
more general case of the stochastic system of nodes consisting of people, organizations 
and machines. 
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We define the community, the macro structure, as a set of performers that produce 
output. An organizational is made up of a set of the performers with which they are 
affiliated. We can think of the micro level in terms of the performers. The organizational 
level is in between the macro and micro levels and can be thought of as an ensemble of 
affiliated performers. We can observe individual output from the data. Each record 

contains primitive messages published by a performer, x h contributes information to the 
community. This is defined as follows. 

p 

X = [J X i is the community (3.36) 

;=1 

X = , x u ...x j Sphere x, ,x u ...x i are the performers of the i"' organization 

(3.37) 

X. is the i' h organization, and i = \..p (3.38) 

The output entropy is allocated from the message to individual author subset 
perfonners from the empirical data. This micro level is then summed up and allocated to 
the to the affiliated organizational level. The organizations are banded based on a 
distribution of the cumulative number of published messages. 

We consider a family of nodes (machines, and people - the atomic level), making 
up organizations (the molecular level), and a community (macro level). In a band, we 
assume all of the nodes have the equivalent properties, i.e. each organizational node, 
comprised of perfonning author nodes, are statistically equivalent. Figure III-19 and 
Figure III-20 illustrates a node taking information in as input S(X), performing some 
transformation, F(X0, to produce more messages (work products). Part of the output is 
expanding the mutual information I(X;Y) intersection of the Venn diagram, and part is 
augmenting the vocabulary. This augmentation is the conditional probability S(Y\X), as 
we saw from equation (3.15). 
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Node Input and Output 



Figure III-19 Input being converted via a transfer function to output 


Node Transform of Input to 
Output 



Figure III-20 Node transfonn of Input to Output. 
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The initial band determination is computed based on the accumulation of 
experience of executing tasks, i.e. publishing messages. The most prolific performers are 
banded together based on the average number of messages produced over the period 
examined. Later the learning, or performance index is computed for each band at every 
time step from the beginning of the data set to the (current) performance time step. An 
example of the distribution is shown in Figure III-21. 

We will perform a coarse partitioning of the performing organizations into four 
bands. Further, partitions are possible, however this is sufficient to demonstrate the 
approach. The “A” band consists of all of the organizations that were beyond 3o in the 
rate of production of messages in the sample for a given technology. The “B” band are 
the organizations in the 3 o partition. The “C” band contains the organizations with a 
message production history in the 2o partition, and the “D” band are all of the 
organization below 2o in performance. 
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Figure III-21 Organization Distribution into Cumulative Task Perfonned Bands 

Our problem is to realize, or at least to approximate, a given system, which we 
call the true system, by a model. We adjust the parameters values based on a number of 
examples provided by observation of the true system. 

The analyses of the partitions can proceed exactly as the analysis of the macro 
level community. This is the beauty of the partitioning. We only have to be cautious of 
combining bands when the counts of terms, (multiplicity of states) are “local” to the band 
under examination. We count messages in a band and develop the probabilities, and 
hence the entropy of the band is based on the total number of messages in the band. In 
order to aggregate bands, we consider this entropy the band’s contribution to the total (all 
bands) entropy. There is an entropy contribution simply resulting from the partition. 
This contribution varies every time step based on the internal organization of the 
messages, constituent terms, and nodes. 


- 130 - 













Node Input and Partitioned Output 



| q. d i multiplicity of terms in D Band S H = The entropy calculated 
C, is contribution of Band i ' locally for Band / 


Figure III-22 Partitions of output into bands. Contribution to the Community 

Each band, i, provides a contribution, Q to the community entropy. The local 
band entropy S H , must be scaled based on the multiplicity of terms in the band to the 

multiplicity Q. of terms in the world. The community, which is sometimes referred to as 
the technology’s “world” entropy is the sum of the contributions. 

n bands 

S «.„, = E 0 P-39) 


where C ; . = 


|Q. 

1 |H. 

1 IQI 


+ 

-log , 

LI 

' | n | 

l«J 


(3.40) 


This relationship pennits aggregation of previous results on a subset of a 
community with more information later without having to rerun the entire world and all 
previously analyzed bands. All that is required is the count of the instances of terms in a 
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band and the count of the number of instances of terms in the world, augmented by these 
tenns. 

Later extensions to be considered would address all of the various combinations 
of author nodes producing a message. For example, the x t performers could be 

represented as combinations of authors producing a record (which as was pointed out, is 
broken down into its primitive messages at various q-levels). Additionally, we could 

-5 

assume that if there are three authors on a record, they represent 2 possible author 
subsets - nodes. Each subset is a legitimate combination producing the messages. This 
distribution develops in exactly the same way as the tenn distribution of sets of sets as 
developed. The ability to calculate the contribution with a ratio of the local system 
instances to the microstates of a larger or smaller system, it was often useful to count 
instances of states. By computing the entropy locally, these chu nk s can be combined 
with other subsystems often with out additional computation. 
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C. COMMUNICATION AND CONTROL MODEL 

What has been described thusfar is an infonnation-theoretic view of the macro 
world, and a method to partition the world into bands. For now we will continue to work 
at the world level, however, recognize that we can partition the world and demonstrate 
the same relationships. Now we marry up a dynamical systems model with the 
information-theoretic model. When both models stabilize, at a rate represented by 
equations of the same form, we have moved in the direction of a match between the 
macro (continuous model) and micro (discrete) model. The true system may be 
considered modeled when we tune parameters in the discrete model and align the entropy 
and conserved property evolution as a function of time. 

1. State Space Representation 

We can represent a map of state space of a dynamical system. Maps represent a 
simplified form of dynamics that makes it easy for us to compare the individual level of 
description (the trajectories) with the statistical description. Contrary to what occurs in 
ordinary dynamics, time in maps acts only at discrete intervals. Recall that the bakers’ 
transformation 16 example illustrates the mixing of a spot of sauce on a piece of dough, 
then folding and stretching of dough. In technology maturation, a node is locally taking 
in a chunk of dough, messages out of the pool of messages persistent in history, and 
mixing them along with new information, e.g. a new tenn, which represents yet another 
spot on the dough. These areas contain remnants from bakers’ transfonnations of other 
nodes that performed the mixing and adding function throughout time. A performing 
node may perform a number of iterations. Other nodes also perform the folding, 
stretching and mixing function. The mixing may occur before and concurrent with 
mixing at a node. The nodes successively repeat the iteration action. We represent this 
with dynamical system maps, with discrete time n. Let X n+1 be the function that 
represents the value corresponding to the application of n bakers’ transformations. 


16 Details are provided in Prigogine 1989 p200-204, a summary is shown in the appendix, p288. 
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X n+1 = F(X n ) 


(3.41) 


The various functions X„ are functions of internal time. The internal time is an 
operator 17 like the one used in quantum mechanics (Prigogine 1989 pi98). The age of 
partition X n is the number n of iterations i that are to be performed to go from X a to X n . 

For ordinary differential equations, (continuous in t ) this is 

= G (X(t)) (3.42) 

at 

In both cases, Xis a vector 18 . The term orbit will frequently arise in the following 
discussions. The orbit of a dynamical system is that sequence of points in the state-space 
phase plane that corresponds to successive time steps in the system. An orbit is generated 
for a map and X(t) for the differential equations when given an initial value of X (at n=0 
for the map, and X(t) for the differential equations). 

Figure III-23 shows a map of the state space. The legend shows the Java entropy 
map marked with a triangle (A) and a dashed line as the upper set of points. The marker 
represents data, the dashed line is an indicator of the curve that would fit the data. In this 
case, it is in the general form of a power function where y=3.46x 44 with an R 2 =.9934. 

Where S H ^ | = bS H ^ m is the specific equation. 

Similarly the circle (O) and dashed line legend are for the Ada points, the lower 
set of points. In this case, the state space map is shows that the data is oscillating in the 
early stages. This shows that the vocabulary and threads of research have not settled 
down at first. Based on observation, see Figure III-23, as the entropy increases, but at 
declining rate, the data starts to approach the y=x line. The spacing between each data 
point gets closer together. This indicated that the data is moving toward a stabilizing 
attractor basin. 


Operators, eigenfunctions, and eigenvalues are briefly summarized in the Appendix p238. 
18 We use the form X n+I = F(X n ), where A is a /(-dimensional vector. 
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Entropy S, 


Entropy Discrete Time Map 



Figure III-23 Java and Ada State Space Finite Difference Map S^+u S* 


The discussion here looks at the attractor of these dynamical systems, since we 
are making the conjecture that the model for technology transfer, or evolutionary 
development can be represented in this form. If the system being evaluated attracts, then 
the evolution is going toward stability. We’d like to be able to say something about the 
confidence as the system stabilizes after initial conditions die out. 

The attractor is something that attracts initial conditions after the start up 
transients fade. An attractor is a compact set, A, with the property of A such that for 
almost every (see Farmer 1983) initial condition the limit set of the orbit as k or t —>+°° is 
A. So almost every trajectory in the neighborhood of A passes arbitrarily close to every 
point in A. The basin of the attraction of A is the closure of the set of initial conditions 
that approach A 
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The eigenvalue of the characteristic equation has a relationship to entropy. This 
relationship is through the Lyapunov exponent, which gives the stretching rate per 
iteration averaged over the trajectory. Using the bakers transfonnation a completely 
detenninistic dynamical system can yield results that appear completely random. The 
bakers transformation also has the property of all dynamical systems, recurrence. The 
bakers transfonnation is invertible, time reversible, deterministic, recurrent and chaotic. 

Bakers Transformation 



** + i 

y k+ 1 


2x k 

yJ2_ ’ 


°<x, < 


1 

2 


Repeated doublings in the x direction 
and halving in the / direction leads to 
rapid mixing. 


2x k — 1 
^ l2 + { 


-<x k <l 

2 


The mapping is completely 
reversible. Run backwards, the 
doubling occurs in the /direction 
and halving occurs in the xdirection 


Figure 111-24 Bakers Transformation 


Research by Prigogine has also shown that irreversibility is li nk ed only to 
Lyapunov time for general irreversible phenomena such as diffusion and various other 
transport processes (Prigogine 1997 pi05). We thus have a link between these dynamical 
systems and technology transfer models herein. In Figure 111-24, we observe that one 
direction x is expanding while the other dimension / is contracting. This is similar to our 
model where the amount of information that is discovered is equivalent to the amount 
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that is no longer undiscovered if the system is defined as two subsystems. Another view 
is to think of a part of the model that is restructuring the internal organization of existing 
information, and the addition of more information that is transported across the control 
boundary. After n consecutive iterations the distance between two points on the x will be 
multiplied by a factor 2" = e nXn 2 More will be said about this, however, according to 
others (Prigogine 1989 p254), Farmer (1983), (Baker 1990), we have a positive 
Lyapunov exponent. 

\ = ln2 

This establishes the dynamic chaotic character of the system. Since this is a 
conservative system, the second Lyapunov exponent is negative A, = - In 2. By 
repeating this process indicated in Figure 111-24, which as time goes on each finite 
subregion will be partitioned into finer and finer strips. If some points (a representation 
of terms) were distributed as in a of the figure, we can see that after n iterations these 
terms would be diffused, mixed, in a number of ways. 

Further discussion can be found in the appendix Appendix A Information, 
Control Theory and Evolutionary Dynamical Systems Basics, (p273) as well as in 
Prigogine (Prigogine 1983, 1989, 1997), Farmer, York Ott, (Farmer 1983), McCauley, 
(McCauley 1993), and Baker (Baker 1990). The following description follows the 
development found in Farmer (Farmer 1983) and Baker (Baker 1990). 

So in Figure 111-23, we see a plot of a one-dimensional map. Taking the 
derivative of F(X n ) in this case yields A. The goodness of fit is determined through the 
finite difference method. It defines convergence and stability points in dimensions using 
the Lyapunov number A. 

The Lyapunov numbers quantify the stability of an orbit around an attractor. The 
Lyapunov numbers are the absolute values of the eigenvalues of the Jacobian matrix at a 
fixed point. A discussion of the orbits, convergence, and stability for roots of different 
eigenvalues is covered in the appendix (Brown 2000, and Saboe 2001). 
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The eigenvalue of the characteristic equation \A-jl\=0, where A is the Jacobian of 
the transformation 


TX n+l =F(X n ) = TX l 


is 


dx 

. _ d(x,y) _ ^ 
d(u,v ) dy 
du 


dx 

dv 


(3.43) 


(3.44) 


The Jacobean is defined by (3.44). The vectors X n+ 1 ,TX n are defined in bold 

face characters. Other restrictions on (3.44) are that functions x=x(u,v) and y=y(u,v) have 

• •• • • 
partial derivatives. For the point (x,y) corresponding to any (u,v) in R lies in R, and 

• • • • ^ 
conversely to every point (x,y) in R there corresponds one and only one point (u,v) in R . 

(Kreyzsig 1993, p519-520). 

The difference equations representing the dynamical system relationship to 
entropy through the Lyapunov number is defined as 


J n = [J(x n ) J(x n -l). J(xi)] (3.45) 


where A is the Jacobean matrix of the map with ji(n)> j 2 (n)... > j p (n) are the 
magnitudes of the eigenvalues of J n ■ A is the Jacobean matrix of transformation T . 

The Lyapunov numbers are 


At = lim n ^:, \Ji(n)] v " , i = 1,2, ...,p 


. (3.46) 
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The Lyapunov number is the smallest, positive, real nth root taken. We follow 
Farmer’s assumption that almost every (Farmer’s emphasis) initial condition in the basin 
of the attractor has the same Lyapunov numbers 19 . This followed from his empirical 
evidence, and the data in this model does not appear to meet the exceptional conditions 
that he identifies. 

These dimensions represent an entropy measure for non-linear systems in stable 
or chaotic regions. 

We compute entropy in two ways. One is from experimental data. The other is 
from a model of the process of transferring (transforming) information. The 
experimental entropy data are related to content of a message, i.e. the information we 
know about a topic. We refer to this as Shannon’s entropy (Sh). The data Sh is gathered 
over k time steps. 

We perform regression analysis on this data and have therefore a function that is 
of the power function form. e.g. y=bx m . This is 


log y = log b + m log x 


(3.47) 


where m is the slope and log b is the intercept in linear form. We also have a 
model of a non-linear dynamical system. The Lyapunov exponent of a map gives the 
sensitive dependence upon initial conditions that is characteristic of chaotic behavior. 
Further discussion can be found in Prigogine (Prigogine 1983), Farmer, York Ott, 
(Farmer 1983), McCauley, (McCauley 1993), and Baker (Baker 1990). The following 


19 The Lyapunov exponent is the logarithm of the Lyapunov number for the 
eigenvalues of the characteristic equation (Farmer 1983). 


description follows the development found in Farmer (Farmer 1983) and Baker (Baker 
1990). 


2. One Dimensional Finite Difference Representation of S H 

We determine the one-dimensional model for computation of this entropy for the 
TechTx Basic Entropy model in a form compatible with the two dimensional micro level 
model. This is 

s Ht+i =f(S Hk ) (3.48) 

A = /'(•) (3.49) 

The macro entropy is partitioned and allocated to the performer and affiliated 
organization nodes. This enables computation of the system entropy at the nodal level. 
This provides the method of computing the Lyapunov dimension from X to measure the 
non-linear system entropy S R , at the micro level or for simplicity of notation, 5«. Note 

"micro 

that this differs from the entropy S H in Figure I1I-9, which is the information entropy, 
NOT the entropy measure for the stability or chaos of the system. 

The general form for the transformation is S H t = f (S Hf ). We have from our 
earlier TechTx Basic Entropy discussion the macro entropy vs. time. 

We develop the relationships using a power law here. However, as 
experimentation progressed, it became apparent for the technology we were evaluating 
that the messages were varying over time linearly and the entropy seemed to follow a 
power form. 

As the power law may be the right fit for some technologies, we develop this 
more general relationship here. For the linear fit, the derivative reduces simply to a 
constant - the slope m. At the end of the day for the linear fit proved to be a very good 
and simple relation that gave most satisfactory results. While we recognize that we have 
to partition and allocate the entropy to the performing nodes, we can use the macro 
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function for illustrative purposes here. Having fit the entropy over time, we have a power 
function in the general form of S H ^ = bk m 

To derive the finite difference form, we have 


S ff =bl: m 

H k 



v J 


s Hk+x =b(k+\r 


(3.50) 


Recall the general form of the finite difference transform is 

To obtain the derivative, we use (3.50) eliminate k resulting in 




k +1 


's„ > 

-f- +1 

v y 


(3.51) 


To find A we get 


dS H 

A = - Hk+i 


dS 


H, 


r S H A 

H k 

b 

v j 


+ 1 


m —1 


1 , 

r S H ^ 

H k 

b 

v j 


(3.52) 


Recall that A, was required to compute the Lyapunov dimension from A to 
measure the non-linear system entropy, Sb to quantify the stability of the system. 


3. Two Dimensional Finite Difference Representation of S H 

Similarly, we develop a two dimensional model using the finite difference 
method. For n dimensional maps, there are n Lyapunov numbers A„ since stretching can 
occur for each axis. 
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A two dimensional model is used for the computation of the Lyapunov dimension 
from A to measure the non-linear system entropy 5«. 


< 


S 


H 


k +1 


N ; 


l k+l 


= F(S Hl ,N, i ) 


(3.53) 


Functions F and G are defined as one-to-one functions in R. We assume that the 

j. 

partial derivatives exist. Now using A as defined in (3.46) or (3.52) A = lim[/■]” where 

n —>oqL J 

ji are the eigenvalues of \A-jJ\ =0 and A is the Jacobean of transformation is defined as 

DT 


DT 


d(F,G) 
d(S,N ) 



(3.54) 


Here we are computing F and G to develop the transfer function and to correlate 
these two dimensions to determine Sb from A, the Lyapunov number. The interesting 
feature of the bakers’ transformation is that it is a dissipative function in state space since 
the sum of the exponents is negatives (Baker 1990 pl22). 

The entropy developed via discrete (micro) dynamical systems model and macro 
level computations both should change at the same rate since we are observing the same 
system. The performance index parameters are adjusted to tune the micro model and to 
match the Sb. This provides a method to identify the performance bands and half-life of 
performance improvement, or maturing of the technology. 
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4. 


Micro Level Coupled Nodes Communicating 


Let's give an example of information being exchanged at the micro level. 
Consider some coupled nodes in a communication system. This example is adapted from 
Brown (Brown 2000). This system described will be represented in a dynamical system 
model, which ends up being the bakers transformation. 

This can be represented in a model of information and the state as it flows from 
the advocate and receptor as seen in Figure 111-25. Model the following communication 
nodes, a sender (5), a receiver (R), and a consumer (C). A simple function with inputs as 
messages and outputs as messages associated with each node carries the dynamical 
information about each node. 


Dynamical System of the advocate 
receptor Tech Tx Interaction 



State Diagram of Information Flow 
in Nodes of a Technology Transfer 
Organization Micro and Macro 


Figure III-25. Dynamical System Model of Advocate-Receptor Interaction. 
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The sender is an advocate. This is a researcher, or in the terms of Fowler (Fowler 
1994) an advocate and producer. The sender issues new work products as messages. The 
receiver is a change agent, or the receptor. The sender develops research, advances and 
publishes a message as a work product, thesis, article, technical report, demo, etc. The 
message is observable, e.g. measurable and countable. We can generally only measure 
output. We can measure output in terms of messages and terms from which the messages 
are made up. Except for one type of input, it is usually difficult to quantify, or measure 
all of the input. 

The receiver receives the message. If the message is understood completely, i.e. 
no need for clarification, the receiver retransmits the processed message and a local state 
transition occurs on the node, as the receiver becomes a sender. The consumer node 
becomes a receiver, and so on, further down the technology transition food chain. On the 
other hand, if some percentage of the messages is not understood, the receiver asks for 
clarification in terms of feedback from the sender. The sender then sends clarification in 
response for the request for clarification. Another way to look at the request for 
clarification, is as a receptor, or researcher, we check the literature. The percentage of 
information we use is the complement to the request for clarification. The feedback gave 
us satisfactory answers. It becomes input from the world of persistent information 
available through time. This is the part of the world of information of sets of sets of 
terms (primitive messages) that the performer will restructure. 

Once the consumer understands the message, the consumer can execute the work 
products. Since a change agent becomes a sender, and the consumer becomes a receptor, 
each is capable of issuing requests for clarification and providing clarification. 

This elemental system (Figure Ill-26a) consists of a send unit and a receive unit. 
The receiver unit is able to retransmit or execute an action when there is little uncertainty 
in the terminal action to be taken. At that point, the receiver executes the action and 
becomes a send unit, since someone else (another potential receive unit) can witness the 
evidence of a signal. Let's assume for the moment a clear, noiseless signal from the 
sender. If the receive unit understands the encryption and protocol of the sender, it is 
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able instantaneously to resend the message or to act. No effort is required to handle the 
encryption and protocol. 

If the message received is well understood, the unit R (at time step 4 ) can receive 
the messages from unit S (sent at time step 4 ./), immediately and resends or performs an 
action, observable as a message, to another (or the same) receiver at a later time step 
( 4 +;). Figure III-26 shows this basic state transition model. Note, that there is also a term 
p' representing message state transition arcs for feedback. The message traffic from the 
receiver R is a sum of the fraction of messages from the earlier send units production and 
multiple streams persistent in history that are available to the receive node and selected 
(fdtered) as input. The sum of the messages is available to be processed by node R. 


5. Entropy in the Communication Control Model 

We can also have the case where there are messages with entropy (noise, or 
unknown signal) as input to R. This can be accommodated as seen in Figure III-26b. 
Now, we add the concept of a "thi nk " state transition. This is the case where the 
messages received could not be effectively processed. Some internal processing is 
required. There is yet another type of "think" state transition. This is represented by 
feedback in order to clarify the entropy, noise or non signal received. Figure III-27 
illustrates the elemental notion presented in Figure III-26b and adds two feedback loop 
state transition arcs P 4 and ps. For initial model development and clarity, we assume that 
the quantity of messages in the think loop P 3 is equivalent to the number of messages sent 
back to the send unit in P 4 . These are subsequently fed to a receive unit as clarification at 
some later time step as ps. It is possible that the send unit has to use multiple time steps 
and its own think loop. Further, it is possible that the receive unit has to do more internal 
processing (and learning) which could store, for more than one time step, a number of 
prior messages awaiting action. We want to avoid or minimize a design that has this 
characteristic. The system would appear to have slow response to transients, and the 
hysterisis effects resulting from these time step delays can put the node and system in an 
unstable mode of operation. While some of this effect is unavoidable, the model should 
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be able to accommodate these aspects as well. We hide this essentially inside the nodes 
performance function. Refinements to this engineering model can be added later. 

The nodes can be in two states, xu, yt. in phase space. The state represented by 
variable yu is the quantity of messages or tasks orders that have been executed by an 
organizational unit, or node at time 4 . The state Xk is the quantity of messages / task 
orders received by the organization at time 4 . x/ c consists of two parts. One is the 
quantity of messages / task orders that the node adds to the system. In a sense ,new terms 
are added across the control boundary so they appear to arrive from the outside the 
organizational node. The second part is the set of internal messages / task orders that 
must be processed/executed by the unit due to the content of the messages / task orders 
processed in the previous time step (feedback) 4 . 7 . 

Software Technology Transition 
Communications State Model 
“Basic” and with “think state” 


a) Basic state transition - interaction -- 
well understood effort 

(^2 — Pi + P technologies) 



Figure 111-26. Software Technology Transition Basic and "Think" State. 
(Source: Saboe2001) 
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Software Technology Transition 
Communications State Model 
“Think” and feedback 


*b yk 



S == send node state (of a unit), typical 
of outside signal from earlier time steps 
R == receive node state (of a unit), 
with think and feedback states 


State variables: 

p.= probability - property that must be conserved 

x k = u k — Quantity of Messages received from outside at time t k p t and p s k _, 

y k = Quantity of Messages executed at time t k p 2 

z k = y Quantity of Messages due to t k _, clarification plus x k p 2 u k 

P4 feedback~ P3 internal processing ^ time tj. 

Ps clarification ~ P 4 feedback^ time h delayed by OIK time Step f +I 

P’ 5 k-i clarification ~ ^ outstanding feedback messages from prior time steps that will be received asx k 
and multiple streams persistent in history and available to the receive node which may be processed 


Figure III-27. Software Technology Transition "thi nk " and Feedback. 

(Source: Saboe2001) 

On the other hand, let's assume that the receiver has to process some internal 

messages in order to unpack the message. Now there is a delay before the message can 

be resent. Going a little further, if the receiver received noise, an unclear signal, or 

unknown signal it may have to request clarification, delaying a time step or do some 

additional correction processing. This uses up node capacity. We know from experience, 

that when we are fully consumed with a project, day and night, we are not available for 

other tasks. This capacity can even limit interaction with the external environment (e.g. 

in extreme cases, this is capacity can even be unavailable for the researcher’s family). If 

the message is simple and concrete, or agrees in abstraction (state level) or is at a higher 

level meta-statement, the amount of processing and effort that it takes to correct the poor 

signal is less than one that is more complicated and more densely packed. From this, we 
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might say that abstraction is a form of information hiding. Encapsulation of this form 
provides leverage and can reduce the "entropy" of the system. The complexity of the 
structure of the message is higher, but the communication is using less bandwidth. 


D. DYNAMICAL SYSTEMS MODEL 

Assume we have available a macro level model of technology transfer to 
represent the community level technology maturation. That macro model can identify the 
stability and convergence of an ensemble of nodes. The macro model can be partitioned 
into a number of nodes (organizational units and sub units that compose the 
organizational units). The macro model is represented in terms of entropy dimensions of 
natural measure (Farmer 1983), i.e. both the information entropy Sh and the bakers’ 
transformation entropy 5g, representing the transfer (transform) function. We now would 
like to develop a model that represents the interaction between nodes at the micro level. 
This model will complete a linkage from macro to micro levels and permit 
implementation models (infusion, learning, etc.) to bridge to the macro-micro 
infrastructure scale models. This section will explore a feedback model at the 
organizational node and sub-organizational node level. We incorporate control theory and 
use the bakers’ transformation. 

The model should incorporate a factor for learning, and address requests for 
clarification and the ability to model the process load in requesting clarification messages 
and receiving clarification messages. This model will permit tuning an organization to 
ensure efficient processing of technology messages. We will develop a node response 
curve and associated system response curve these can be developed from the macroscopic 
view. Determination of the bakers’ transformation entropy from the Lyapunov number 
and exponent will permit an assessment of the node performance in terms of stability and 
confidence of convergence to a steady stable state, or chaotic state. 
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1. Assumptions 

Assume nodes made up of people and machines that can do a task, such as publish 
a work product as a message. A node is modeled in terms of the messages it receives 
verses the messages it processes. The work product (message) is the representation of 
something that can be understood by communicating in terms familiar to the sender and 
receiver. For instance, a map is not the road system but symbols from a vocabulary of 
terms that represent a common understanding of the lay of the land of a road system. The 
terms are measured in information units - bits. As input, the processing node receives 
work product. These represent messages. Output from a node is also observed and 
measured in messages. A technology generating or processing node produces the output 
by acting on input to reduce uncertainty in the cause and effect relationship involved in 
achieving a desired result. This is reasonable since this is what elements of a node do. 
This is true for the activities of researchers, producers in general as advocates, or 
receivers, change-agent and consumers as receptors. This assumption is also consistent 
with the observation by Rogers (Rogers 1983). Within this context, we examine the 
meaning of the concepts of stability, equilibrium, attractors, chaos, eigenvalues, and 
eigenvectors, and the relationship to technology transition and, system node dynamics. 
Convergence of an organizational node on a fixed point depends on the nature of the 
eigenvalues of the derivative of the dynamical system at the fixed point. The direction of 
convergence depends on the direction of the eigenvectors. A useful term that will 
frequently arise in the following discussion is an “orbit’. The orbit of a dynamical system 
is that sequence of points in the state-space phase plane that corresponds to successive 
time steps in the system. We discuss seven cases in the appendix. 

2. Context 

We assume that all of the nodes have functions of equivalent form. As described 
in the TechTx Entropy Learning Curve model, nodes, in different performance bands, 
inherit the performance parameters of their band. The node is modeled in terms of the 
messages it receives versus those it carries out or processes. The individual nodes are 
assumed heterogeneous, varying in size and composition, or a mix of people with varying 
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skills and tools to perform the function. For ease in validation computations, we assume 
that of the organizational nodes that have a performance index in the range of +/- lo of 
the mean (recall Figure 111-21), all would have the same learning curve function 
parameters. We can partition the volume down into finer and finer bins. The best model 
would look at all of the sets of sets of performer combinations and partition this into q- 
levels. For now, however, it suffices to allocate the nodes with statistically similar 
performance to one of the four appropriate bins. 

Should we wish to calibrate an individual node or all of the nodes in the band, the 
model will still be applicable. The capacity of a node in the band can be calculated. The 
volume and complexity of messages acted on and generated applies pressure to an 
organizational node. Demands on the organizational node as a sender or receiver 
component are among the pressures that require modeling and analysis. Other pressures 
are internal to an organizational node to ensure smooth functioning. These internal 
pressures come in the form of messages as well, and procedures, interfaces, meetings, 
collaborations and other interactions that consume resources. These are important facets 
to model since they provide feedback pressures on the components. External pressures 
are also among the features that determine organizational node dynamics and this should 
be modeled. 

All of the pressures mentioned so far can be thought of as messages passing 
between organizational nodes and between the organizational nodes and the environment. 
This concept facilitates modeling organizational node states that can be organized as 
messages received by the component and processed by a component. In this respect, the 
organizational nodes are analogous to a communications network. The analog is simple 
and useful. There are, however, at least two important differences. One is that an 
organization will adapt to and absorb pressures that would cause a network to breakdown. 
This is because the network is not hardwired. It is also difficult to predict the breakdown 
capacity in advance. We have somewhat addressed ranges of capacity by banding the 
organization into performance index bands. This however does not mean that a node is at 
capacity. The potential for the technology transfer system to break down is important to 
model. A simple source of collapse is when the demands on the system exceed its ability 
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to adapt, and the node reaches a state of demoralization 20 . This is important since it can 
result in a component ceasing to communicate, or the communications decreasing to a 
critical level. In the communications network analogy, the number of messages being 
processed begins to decay until it reaches an inoperable level or is zero. We have 
mechanisms to model this, however for purposes of illustrating the model, are at or below 
capacity. We can ignore for now this breakdown at over capacity issue. 

The model for organizational dynamics is drawn from (Brown 2000). This model 
can be represented in state space using the messages (N, and primitive messages term sets 
of sets (n). This can be related to entropy ( S H ). For notation ease, we drop the subscript 
indicating that this is entropy in the terms of Shannon. 

The state space is mapped onto the x-axis (input) and y-axis (output) as follows: 

x, the input N k , in messages represented as entropy in information units, and the output in 

y, N k+l , where k represents the time step. (The internal time as an operator, and not a 

number). We would not have synchronous discrete time steps in a network that includes 
nodes comprised of organizations and people. 

N k+l =f{N k ) (3.55) 

This function represents the bakers’ transformation. For the ensemble of nodes 
performing the function N k+l = f (N k ), we have the vector representation N k+] = F(N k ). 

We narrow our discussion from the ensemble of messages operator on by nodes, 
which appear on the network or disappear to a typical group of nodes: the sender, 
receiver and consumer. 

The model uses two state variables. A variable of the system node representing 
the messages received and one for messages processed. We shall apply the message 
information in terms of the entropies of the incoming and processed messages. The 
significance of the system of equations is that the eigenfunction characteristic equation 
represents the bakers’ transformation of folding, stretching and rotating. The eigenvalue 

20 The overheating of the internet dot com start ups is an example of organizational nodes that were 
under too much pressure. Competing at "internet time” caused many organizational burn out tragedies. 
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of this dissipative function is also entropy, and it represents mixing. The appendix 
examines a number of cases and discusses the potential significance of the values of the 
eigenfunction. 

Prigogine (Prigogine 1989 pi98) summarizes of how the general properties of a 
dissipative dynamical system can be represented and evolves. He states that the very 
existence of dissipative dynamical systems is a manifestation of the second law of 
thermodynamics. 

3. Dynamical Systems Model Equations 

Now we will develop the equations for this model. The relationship between the 
state transition diagram and a dynamical system is shown in Figure 111-28. The sender 
publishes messages w* (a natural number of messages) at time step k. Input messages at 
time step k to the receiver are indicated by Ay (a natural number of messages). The output 
messages from the receiver at time step k are given by y*. (a natural number of messages) 
Some percentage of the messages’ output from a prior time step jy.-/, are indicated by /?, a 
rational number. 

This process is repeated for the next time step Xk+i and v/ f -/. The crossed circle 
immediately to the left of the receiver node represents the collection point where the 
different parts of the input message stream are combined for the input message count jc*. 

In Figure 131-2%, f(xt) represents the function to transform the input messages into 
output messages. It takes a time step to complete the processing. A way to view the 
nodes processing is that for a message to move through a node, it takes a time step. 
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Relationship of State 
Diagram to 
Dynamical Systems 
Model 


Figure III-28. Dynamical Systems Model. 

The Xk state variable consists of two parts. One part of the state is the messages 
that come from outside the receiver node m* i.e. from the sender node. These are new 
messages consisting of terms that count be either the conversion of questions {???} to 
answers {!!!} from term sets that were previously nulls {}. Alternatively, the answers 
{!!!} that may have been previously discovered, which contribute to more mutual 
information. The second part of the state variable is clarification of messages that was 
requested from the previous time step yu-i- Initially we assume that the quantity of 
messages processed (jy-) is a function of x/ f . As we said earlier, while it may appear that 
we could have non-determinism here, this is not the case. If we could kn ow all of inputs, 
there is a deterministic relationship, however, it impossible to know all of the inputs. We 
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must distinguish between non-deterministic and probabilistic. We simple don’t have 
enough information to accurately predict the result. A function could be a reasonable 
approach. . This function has the following properties: 


(1) if Xk = 0 then \'k+] = 0 

and 

(2) as Xk —» 00 then y k —> 0 

Condition (1) says if there is no input at time step k, there is no output. This 
holds only if there are no messages stuck in the node or latent messages in the form of 
clarification coming in from prior execution steps. Condition (2) says that the system 
grinds to a halt if the message demand is too great. We can assume that as the number of 
messages received becomes infinite, the messages processed have to approach some 
limiting value, which is the capacity of the system. The system can be represented by the 
following equations. However, for the systems we are seeing, we are not at capacity, and 
this condition can be finessed out of the picture in low pressure, low temperature 
situations. We can determine when this happens by partitioning the macroscopic 
community into smaller and smaller partitions. Then we can observe the performance of 
nodes with a technology and in the environment of the day. 
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x k + i=Py k -\ +u k 

y k+ i=f( x k ) 


(3.56) 


f(xk) is called the node response curve. We need only concern ourselves, for this 
exposition, on the node response curve and its ultimate relationship to the macroscopic 
information theoretic model. The above is a second order system of finite difference 
equations with the response curve y k+l = f(x k ) represented by the following three- 
dimensional dynamical system. Where z*, clarification from the prior time step, is 
substituted for y^-i and using the mapping referred to in (3.43) and (3.44) \* 
MERGEFORMAT (Kreyszig 1993 p419) we get 


f \ 

x k+l 


f > 
X k 


'P Z k +U k 

y k+ i 

= T 

y k 

— 

K X k) 

K Z k+l j 


K Z kj 


{ y k ) 


(3.57) 


Let’s assume X,Y,Z is the time step k+1. 

The periodic points determine the dynamics of the system. In particular, the fixed 
points are of interest. These are the equilibrium points. The coordinates of the fixed 
points are given by 




'P z k +U k 

Y 

= 

f( x k) 

UJ 


{ y k J 


The fixed-point condition becomes x=u+j3f(x). The derivative of the 
transformation T is given by the Jacobian 
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,_d(X,Y,Z) 
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dY 

dY 

dY 


dx 
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dz 
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v dx 

dy 

dz y 


( 

0 

o p 

DT= f\x) 

0 0 


(3.59) 


(3.60) 


(3.61) 


Where DT is the Jacobean of transformation T. Find the eigenvalues j t which are 
the roots of the characteristic equation: 


\A- jl\ = 0 (3.62) 

Where A is the Jacobian of the DT transformation, and / is the identity matrix. 
More specifically the determinant 

| DT - jl | (3.63) 

is the characteristic equation when set equal to zero. 

-f+ff'(x) = 0 (3.64) 

There are three eigenvalues for the solutions of the equation f = fif'(x). There 
are two complex conjugate eigenvalues and one real eigenvalue. The three eigenvalues 
may be represented as 

yj ,y' 2 e (2OT/3) ,y 3 e (_2OT/3) 

where (3.65) 

j,=UT(x)) m 


- 156 - 



is the real root of the equation. From the model, we conclude that 


. 3 

Ji 


< /V'« 


The system is stable when \j i | < 1, in equilibrium when the norm is | j. | = 1, and unstable 
when \ j t > 1 (Farmer 1983, Baker 1990, Brown 2000). 


This gives some insight into the structural stability aspect. The control theory 
element of the current research model addresses mixing, and structural changes due to 
feedback from external nodes. The value of the norm (<1, =1, >1, real imaginary, etc) of 
the eigenfunction characteristic equation assimilation of reality based on experiences 
from prior time steps. 

From this, we see that for small enough /? or large enough uq we can achieve 
stability. For the technology transition system, we desire stability and convergence. 
With a stable model at the organizational level, we have organization nodes, which are 
not thrashing or wasting effort. With stable nodes, we can build a stable infrastructure 
composed of those nodes. This will also yield convergence of the technology. 

The data that we can measure is the number of messages published at some time 
k. We can also measure output y k+2i k+i,k, k-i, k- 2 ,- The output message data is simply the 
offset published by a time step e.g. U( t - C y The difficulty we have is, that the macro data to 
empirically support f'(x) cannot be arrived at directly. 


Our system curve from empirical data is the output y, which represents u offset by 
an interval c from a prior time step. Initially, for the data examined, this interval was one 
year. In effect, this provides an immediate memory for chunking of three registers 
because it take three time steps to clear all of a message when there is a request for 
clarification. 

As this immediate memory, represented in time steps is expanded, the error from 
the modeled to predicted should start to diminish. Therefore: 


Y =yt= u u-c) 

(3.66) 

X = f3f{x) + u l =ffa t _ c + u t 

(3.67) 
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Now deriving from (3.66) and (3.67) we get 


rw \ _ dY _ dY / dt 
' {X) ~dX~dX/dt 


(3.68) 


The following result was obtained using parametric differentiation of (3.68) and 
substituting (3.66) and (3.67). 


/'(*) = - 


U 


(t-c) 


(3.69) 


P U '(t-c) +U ll) 

We can substitute f\x) into (3.65) which defines the real eigenvalue: 


, 1/3 


J = 


P 


U 


(t-c) 


P\t-c) +U (t) 


(3.70) 


or explicitly to enable programming from the data sets 


du <^c\ 

/'(*)= —T-^- 

g dU (A-c\ f 

dt dt 


(3.71) 


The point where the graph intersects the line y=x is the equilibrium point. The 
slope of y=u+j3f(x) at the fixed point is the real eigenvalue of the matrix DT(X). By 
changing the parameter /?, we change the shape of the graph and thus we change the slope 
where the fixed point is found. Also by changing uq,, we change the location of the fixed 
point along the horizontal axis and thus the eigenvalue. By starting uo at 0, we first have 
a fixed point whose real eigenvalue is positive and less than 1. This is ideal in that it 
indicates that the solution will converge to a point where it remains stable and makes 
sense. The review of the various characteristics of the eigenvalue is developed in the 
appendix. See these graphs and the various interpretations of their meaning in the 
appendix. 

For the moment, let’s go back to the model consisting of sender, receiver and 
consumer Figure 111-26. Now let’s focus in on the receiver and look at the inputs and 
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outputs of this node. It turns out that any of these nodes looks like a receiver in the 
general sense. The sender can also be picking up new messages from others, in which 
case the sender acts like a receiver. The sender can also be requesting clarification and 
be receiving clarification in the same manner as the receiver. Likewise, the consumer 
gets input and outputs. So our model can be seen in Figure 111-29 to have all of the 
features but represented only in a single node, the receiver. When the “receiver” conjures 
up a goal set of objective terms of previously unanswered terms {???} and puts them into 
answers {!!!} in the system for the first time, these terms represent Uk, or the conditional 
probability P(Y|X) and conditional entropy S(Y\X). The mutual information represents 
the terms that were previously know to the community, but were now reinforced with 
additional instances of the terms. Using the single node version of the model, we also 
have a useful sign convention. All of the inputs to the node are positive and outputs are 
negative. 


Software Technology Transition 
General Node Inputs and Output 


S == send node state (of a unit), typical 
of outside signal from earlier time steps 
R == receive node state (of a unit), 



with think and feedback state transitions 
C == is a consumer node (state of a unit) 


•p 1 is the input from a new publisher in this 
time step — 


•p 2 is the output, the publication in the time 
step 


•p 3 and p 3 ’ are memory in and out 


•p 4 and p’ 4 are requests for feedback out of 
and into the node respectively 


•p 5 and p’ 5 are responses to 
clarification requests out of and into the 
node respectively 


The information is conserved in and out of 
the node (note sign convention) - so 

P 2 + P3 +P4+ P 5 = Pl + P3 + P 4+ P 5 


Figure 111-29. General Node Inputs and Outputs. 
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We are now in a position to thi nk of an ensemble of nodes. Essentially a 
distribution of these nodes is performing the bakers’ transformation. Just like a physical 
system or communication system, we now can speak of a macro stochastic process in 
terms of entropy and information. 

With the compelling evidence of the curve fit data in Figure IV-6, we reevaluated the 
eigenvalue function of the control equations using linear curve fit for messages verses 
time step. 


u (t - c) = mt + b (3.72) 

where we are computing, the messages added to the system at time step t. Since 
the equation is linear, the more general form previously developed for a non-linear u(t) to 
enable varying the interval over timestamp t-c has no effect on the additional messages 
added to the system in a timestamp. For our first approximation, the derivative of u t will 
always be a constant. That is the slope m. 

Then taking the derivative du( t . c )/dt in (3.72), we have a constant for j in (3.70). 
At this point we wish to tune /? to see if the determine if the dynamical control model 
stabilizes with a function in the same form as the macroscopic entropy Sh- 

We are now in a position to thi nk of an ensemble of nodes essentially a 
distribution of these nodes performing the bakers’ transformation. Just like a physical 
system or communication system, we now can speak of a macro stochastic process in 
terms of entropy and information. 


Fet’s go to the basic equation (3.25). Recall our conserved property is messages 
N, information in our case. Using Shannon’s entropy Sh and N for the number of 
messages we get 


_L = A- 

T ri.V 

**H 


(3.73) 


From the section dealing with the information theoretic aspects of interaction 
subsystems, we saw how T varied with a time step. Now we can observe the control 
model, dynamical entropy, Sb, as a function of the same time steps. We have the 
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opportunity to relate the two entropy measures, S H and S B since they are related to the 
same information system of messages N. We are dealing with the same information 
flows, hence the same system, so this seems reasonable. Recall S B is related to 
Lyapunov’s exponent X, which comes from the eigenvalue j. 

We found the relationship of messages verse time step in Figure 111-8 was very 
satisfactorily modeled as a linear equation for this technology set. (It could be different 
for other technologies, this is why we have dealt with the relationships in terms of 
functions, eigenvalues and derivatives.) In this case, the derivative of the linear model 
reduced to a constant in equation (3.72) as noted earlier. 

Now instead of using an average, or guess for /?, it is computed directly. To compute 
/?, the amount of information that a node consumes which persists in time, both equations 

are a function of timestamp, so we can solve S B (k)=jk , j k —bjk ' and S H (k) 

S H =b s k . for A. Setting them both equal to each other, we can solve for f3 ( S B S //). 

J_ 

m S 

= k (3.74) 
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and 


r . \ 

h 

b j 

V J J 


m . 


-k 


which yields 


m)= b j 


£ 

b 0 


m 


nir 


S t(ji)= b s 



m v 

YJ 


V 



(3.75) 


(3.76) 


(3.77) 


Here the subscripts s refers to slope and intercept terms of the Shannon entropy 
equation, and the subscript j is referring to similar terms in the Sb, bakers transform 
equation. 

Before we do, let’s explore the relationship to temperature from the discrete, 
micro, model. Earlier, using a macroscopic approach, we showed that temperature 
increases, or decreases with increasing or decreasing pressure on a node respectively. In 
a physical system, we can address temperature in of entropy and conserved property, let’s 
see that this is true for this discrete, micro formulation as well. 

In Figure III-30, we see on the left-hand side, that there is a transfer function that 
converts X input, or some percentage of the available persistent input, into Y, output. 
This is really made up of two parts as seen on the right. We can use a Venn diagram as 
introduced earlier. Extensive properties like messages are additive. Probabilities are 
multiplicative. This also applies to the entropy. 
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Node Input and Output 


*/c 




S M S k 

_I I_I I_I 

Joint Input Incremental 

new information 


Figure III-30 Node Input and Output in terms of Entropy, and incremental new 

information 

4. Temperature from Discrete Control Model 

Recall equation (3.1), now that we have a discrete model, we can develop a 
relationship that is consistent with entropic approaches of statistical mechanics. 

Let’s assume that we need to line up with the equation (3.25), we see (3.1), can be 
written as 

S M =S k +m N N k (3.78) 

N k+l =N k +m s S k (3.79) 

Here we are suggesting that there is a linear (in) relationship of messages (/n y A7) 


in (3.78) that are discovered, and added, and similarly that there is an entropy relationship 
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(ms Sa) in (3.79). The respective m for the messages and entropy also will address the 
units. 


Rearranging (3.78) and (3.79) we get 


s k+l -s t 

N k+l -N t 


m N N k 

m s S k 


m s S k 

m N N k 


1 _ AS 
T ~ AN 


(3.80) 


This makes (3.1) consistent with the definition given in (3.25) for temperature T. 
We can now write (3.78) and (3.79) in terms of temperature, since 


T _ m s S k 
m N N k 

(3.81) 

m N = s k =m N (S k ,N k ) 

(3.82) 

Tm N 

m s = J ^ = m s (S k ,N k ) 

(3.83) 

These equations are nicely linked through temperature. 

x k +1 PSk -1 


s2= S +Y Sk 

(3.84) 

N =N +Tm N 

iy k +1 iy k^ irn N iy k 

(3.85) 

This can be written 



= s k 


f . m s A 

1 + — 

v T , 


and N k+i = N k (l + Tm N ), however, examining (3.84) we can 


now better see the relationship to the feedback control model shown in (3.56). We will 
elect to have only one tuning parameter fS on the entropy equation this time. This way we 
can relate the system to maximum entropy. Since the pair of equations are linked via 
their second terms, one tuning parameter is simple and sufficient. 
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Now we relate to the equations to the Venn diagram. The mutual infonnation 
component I(X; Y) is the part that deals with irreversible entropy, and S//(Y\X) represents a 
reversible entropy. The tenn f5yk-i is a way of looking at how much as a percent of the 
previous body of knowledge the research node is questioning and restructuring. In the 
control model the receiver node, asked for feedback from the early senders. The 
counterpart to the request for feedback was received as clarification. When the receiver 
node reaches out and touches the exiting structure of terms in various microstates, this is 
the percentage of the Venn diagram represented as mutual infonnation in this time step. 
The entire Venn diagram becomes the contribution at this time step to the body of 
knowledge, along will the rest of the communities contribution at this timestep. This then 
is available as input y k .i, or Sk+i(X) and PNk+i(X) or the next time step. 

Here we can look at the total entropy consisting of two parts as seen in Table III-1 
and (3.86). 


Irreversible 

dS (irrev) 

Mutual 

Information 

I(X;Y) 

Production 

Py k ~ i 

Reversible 

dS(rev) 

Conditional 

Entropy 

S h (Y\X) 

Portation 

u k 


Table III-1 Model components 


dS tota i = dS (irrev ) + dS (rev) (3.86) 

~XYQ W\X) 

production portation 

The im port or export of an extensive variable might be referred to as “portation”. 
This is a change in entropy due to the addition or removal of some of the systems 
extensive property. This is when we add extensive properties across the control volume 
boundary and thus increase the bounded control volume. An example of this would be 
adding terms to the vocabulary. Then a researcher provides input in the variable up. This 
is, in principle, reversible. Another example, since we now have a relationship if 
extensive and intensive variables through temperature, might be adding or subtracting 
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volume, i.e. more nodes, organizations, authors, etc. This is what can be considered a 
“becoming” property. 

The production component of the equation deals with the organization, or 
rearrangement of free or available microstates. The mutual information is dealing with 
part of the systems entropy that is locked up in the structure of the system. This is a 
function of the present organization of the microstates of the primitive tenns. When a 
researcher moves and combines existing terms in an arrangement that was not previously 
populated, we are dealing with a “being” property. 

These free states are those that are available for any system to change its future 
organization by conversion into chaos (usually heat) or order (usually work). These two 
components have an important relationship to the main differences between a learning 
organization and knowledge management, which are also typical of science, research and 
technology advancements. Hence, we have relevance to technology transition. In first 
and second-generation knowledge management, and technology transition, the focus is on 
the porting of knowledge, which happens reversibly. This is typical of traditional 
education. In a learning organization, the focus is on the production of knowledge that 
happens irreversibly. This is typical of competency based education and self- 
actualization (Maslov’s highest level) which is done during advancement of science as in 
a effort to achieve a Ph.D. Here we are concerned with using the universal availability of 
free energy 21 . 


5. Temperature and the Partition Function 

We saw how microstates of an alphabet were related to entropy on p99. Look at 
the maximum entropy for a number of small alphabets, indicates the number of potential 
microstates at each q-level for an alphabet of 128 terms. This is the peak in the middle, 
centered at q-level 64. Read the microstates for this curve only on the left-hand y-axis. 


21 Implications of an natural evolution, entropy always increases, toward stringing concepts to signed 
numbers so that the more complex our conceptualization becomes, the less our confusion (complication) 
becomes. We are moving toward knowing and being. This paragraph was the resulted from a chat room 
discussion on the web. 
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Temperature is related to the free or available microstates relative to the maximum 
entropy of the vocabulary. 

Regardless, we can still look at the free ’’energy” (i.e. conserved property) states, 
that, is the available states to which terms could populate. Recall (2.9) where U 
represents the internal structure. Chemists actually have this well figured out. Thanks to 
Gibbs, they view components (our messages) each having a definite structure allowing a 
definite reaction mechanism. The work with A F = Q + W + ?+ ??+ ???+... and a machine 
view is A E = Q + W + ?+??+???+... where the original construct allowed for adding any 
yet undiscovered method of converting energy 22 . 

If the internal structure has available free microstates, we can stimulate the system 
to populate various q-levels with sets of sets. Then we can use the partition function 
related to the microstates of each q-level to determine the temperature. 

This is very convenient since this is the partition function, the most useful 
equation is statistical mechanics and it contains the temperature term we desire. 

P(q i ) = Ce q ’ /kT Boltzmann-Gibbs (3.87) 

If we sum all of the q-levels, we get Qj given by 

_3l 

Cl c = ^ e kT Partition function (3.88) 

ieQ. v 

where q, is the property to be conserved, T is a temperature, A: is a constant for unit 
conversion, and C is a nonnalizing constant. It turns out that the normalizing constant 
C=l/T. To get the units right for the conserved property with a constant volume T=n/V 
or messages per node. It would appear that this conserved property can be records (a rich 
message), message primitives with a single tenn distribution, or message primitives 
distributed in sets of sets, or q-levels. 


22 We know that magnetic fields, and radiation fields at one time were not understood to influence the 
conversion of free microstates. Could we possibly someday see an information field theory, based on 
statistical mechanics, information theory, and software physics ? 
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Here is how we relate this to messages and nodes. Using Boltzmann-Gibbs, we 
know that c/, is the sum of the primitive messages in) in the bins. 

We know that the total number of messages n are distributed over V nodes 
(authors). So following Boltzmann’s logic, if this were continuous, we would have q t 
distributed over an infinitely small size (dq) of infinite bins and we would have 

[P(q,)dq = 1 (3.89) 

and from the nonnalization condition which takes all of primitive messages C2=n 
distributed over all of the nodes V 

£° qP(q i )dq = QIV = n/V (3.90) 

we find that 

C = \/T (3.91) 

and 

T = a/V (3.92) 

where Q. = n primitive messages (sets of sets of terms) We don’t have an infinite 
number of bins, rather our bins are countable and numbered 1 through zj, where 
T = {terms} and 2 r = {messages} 

We can also maximize the entropy when there is an equal distribution of all of the 
tenns S = p(q) log 2 p(q) which is 

■s„,=-iog 3 ^r < 3 - 93 ) 

This is the analog to free energy in statistical physics. 

F = -kT InQ (3.94) 

This is the available, or yet unoccupied microstate, not already tied up in the 
structure. 
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Depending on which type of message is chosen, you will get a different value for 
the conjugate Legendre developed intensive variable. Each will give a different 
temperature. The true temperature in fundamental information units must be done on the 
basis of the n-tuple pair-wise sets of sets combinations, which are allocated to q-levels. 

Since each granularity of message has a different temperature tenn /3=kT we 
need to define the specific heat, or the heat capacity, C p in bits relative to the true 
temperature. Fortunately, heat capacity in bits has been developed by Fraundorf 
(Fraundorf 2000). Here is a summary. In a continuous system we would say, 


= f or no work or C, = 

T dT 


(3.95) 

dQ = f C v dT 


(3.96) 

Since T>0, when T —» 0,D —» 0, so 



Q 1 pr r>0 1 r 

£ = = C cIT = [ C v dT — ( 

kT kT •*> kAT J 

/C„\ 

\ k / 

(3.97) 


where is the heat capacity in bits over average temperatures ranging between T 
and absolute zero, k is unit preserving and relates the higher level messages to the 
fundamental primitive message measurement for the temperature. This means, we are 
able to relate heat capacity in bits to enable comparison of different measures of the 
conserved property; messages as records, messages as single primitive terms or 
messages as the true combinatorial set of sets of primitive single terms. This is adjusted 
using k in the temperature term fi=kT. For the set of sets of true primitive terms n, k=l. 

For this model, and likely for most systems whose internal structures can be 
represented with n-tuple q-levels of sets that can be developed from binary combinations, 
the appropriate granularity is sets of sets of message primitive n. The use of sets of sets 
of primitive messages also has the property of relating to the multiplicity of microstates 
which we say can be directly related to the entropy by taking the log 2 . Further, a 
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reasonably small alphabet, with a vocabulary consisting of as few as 32 terms gives a 
nice statistical sample set with a few messages. 

This distribution function is also closely related to the more general Weibull 
probability distribution function. 

_[ q,+r\ 

P(q i ) = e p ' a Weibull Distribution (3.98) 

We see /? = kT for the Temperature tenn in equation (6.1). Hence, we a have a 
relationship between Temperature from the microstates distribution at a given q-level. 

We recall q-levels represent a set of q-level sets. A set composed from a pair of 
subsets is q-level =2, {AB}, a set made of three subsets {ABD} is q-level=3, etc, the 
more combinations, the greater Q h the more complex, the higher the temperature. 

We can only go so far with the analogy to a physical system. Infonnation and 
messages, are unlike a physical system. Our software physics based on information 
theoretic mechanisms has to differentiate over all of the various states of the q-levels. 
For example, {AB} is different and will occur with different probabilities in q-level=2. 
In a physical system, if two particles at energy state q=2, they are indistinguishable and 
all of the particles at that q-level have the same probability. This is what pennits 
Boltzmann’s equation S- kin W to lose the summation operation seen in Shannon’s 

famous equation. In the Shannon entropy equation, S H = p(x)\og 2 p(x ), the 

xeE 

coefficient when summing equal probabilities would end up being equal to one. The 
partitioning could just be across the energy levels. Although obvious now, this was a 
difficult problem to resolve to get the temperature to compute properly from both 
Shannon’s view and the Boltzmann-Gibbs partition function view correctly. 

In the proposed approach, while we bin all the sets of 2, and sets of three, etc, 
each set in the bin also has its own probability. If we did not do this, we would 
significantly underestimate the internal structural complexity, and hence the entropy of 
the system. 
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The temperature term, kT, turns out to be a constant for the system at a time step, 
which is 


driven by a . This is visible in the appendix where the Weibull function is 
linearized to develop the curve fit. 

We now have, in hand, a method to develop temperature in four ways. 

1. From (3.25) where we look at the slope of the change of microstates to the 
change in the conserved property of two interacting subsystems. 


1 

T 


A ^ 

An 


(3.25) 


2. The second approach looks at the dynamical system model. Here a pair of 
dynamical equations (3.80) represents the discrete interactions and seems to yield a 
relationship to temperature. We can partitioned down the macroscopic world to represent 
trajectories of microcanonical ensembles and their probability distributions at the node 
interaction level. 


S k + 1 S k + >n N N k 


(3.78) 


N k +i N k +m s S k 


(3.79) 
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(3.80) 


‘'S^k 


S k 
™N N k 


3. The third approach is through available occupancy microstates related to 
maximum entropy and the partition function, (3.88). 


P(q i ) = Ce 


-r„~<ii lkT 


Boltzmann-Gibbs 


(3.87) 


If we sum all of the q-levels, we get Q., given by 


_ -ii 

Cl c = ^ e kT Partition function 




(3.88) 


and the more general distribution in the form of the Weibull function, 
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P(q-) — e p ’ a Weibull Distribution (3.98) 

4. Closely related to all of these is the apparent relationship of temperature being 
proportional to pressure, where pressure is in terms of the conserved property per unit 
volume, or messages per node. This was seen in (3.34) 


m 

P{k) = ^(T(k)-b T ) + b P 
m T 


(3.34) 


This is dimensionally correct as we saw from the partition function nonnalizing 
condition. 


We also relate heat capacity in bits to enable comparison of different measures of 
the conserved property. This is adjusted using k in the temperature term (5=kT. We see 


this in (3.97) 




which is the average heat capacity over temperature ranges 


from T to absolute zeros. This is valuable, since we can detennine the heat capacity for a 
technology as we observe a sample over time. This then permits us to use the heat 
capacity to predict the number of nodes that must produce in order to get to our desired 
end state. 


6. Relationship of Marco and Micro through the bakers transformation 

All of the pieces have now been developed. Let us bring it together using the 
bakers transformation. Equations (3.84) and (3.85) can also be written 
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(3.100) 
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This is of the form of the bakers’ transformation. In this more general case, 
where p s is a rational number 
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(3.101) 


instead of 


0<x,< 




V t's j 
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1 
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(3.102) 


and 




< x, < 1 we have 


V-^7 


( m ^ 

1 + —^ 
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<S k <l, (1 + Tm N )<N k <\ 


(3.103) 


Here p s 


m. 


! + -?- 


and we can see the relationship of (3.78) or (3.84), (3.79) or 


(3.85), and (3.80) to reversible and irreversible, portation and production (3.86), mutual 
information and Baysian conditional probability, and chaos and order. The bakers 
transformation is related to a unit square with Euclidean distances. In our case, the 
control volume defines the unit square. We have the phase space representation of the 
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mapping of (3.101) locally expands horizontal segments by a factor of l/p s and contracts 
vertical (stable) ones by p s . These are the chaos and order components respectively. 

The bakers transformation is related to Bernoulli shifts (Prigogine 1989 p202). 
The simplest class of Kolmogorov systems (Elskins 1986) is Bernouli shifts. The 
relationship between the dynamical systems and information theoretic (Shannon 1948) 
and (Jaynes 1957) is kn own and directly exploited here for the foundation of technology 
transfer dynamics TechTx and the foundation for software physics. 

It also makes sense. Now, we see that we are defining an evolutionary process. 
There appears to be a temperature, which we can represent in bits. We can define a 
specific heat for the entities under question. The process is really a program. The 
program takes information in and the length of the program and the entropy will be 
detennined by the maximum entropy, the point where every state is known. 

The idea that Kolmogorov has is there are objects and there are descriptions 
(encodings) of objects, and the complexity of an object is the minimal size of this 
description. If we have one publisher, and the publisher encodes a message, we can sum 
all of the publishers and messages (a countable number) and say some real things about 
the ensemble of messages (objects) and publishers (elemental control model nodes). This 
can be represented by a program (this process) in a finite length for the nodes and 
messages generated. 

On an intuitive level, (per Uspensky) the elements of a “space can be taken as 
informations, and va^ Vk+i means that the information va+; is a refinement of the 
information va (and hence va+; is closer to some limit value to which both va and va+; 
serve as approximations.” This even sounds like technology maturation. 

In the appendix, we also see how the mean squared fluctuation of a 
property is related to the free “available” microstates. Future research experimentation 
can explore the actual values in the relationships for various technologies and evolving 
processes. 
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IV. DATA ANALYSIS AND VALIDATION 


Data has been collected on a sample of 50,744 raw messages for the seven 
technologies identified below. For purposes of exposition of the data to validate the 
model, the case of Ada is reviewed in detail. Java is summarized and plotted. 


Technology 

Messages 

(raw) 

Final 

Messages 

Terms 

Instances 

Confidence 
Interval ± 

Years 

Ada Experiment 1 

6,023 

3385 



1.7% 

22 

Experiment 2 - N 


4195 

1460 

17,347 

0.76% 


n singles 




17,347 

0.76% 


n 


u 

74,735 

118,141 

0.3% 


Java - N 

6,307 

4852 

2421 

26,309 

0.6% 

6 

N 


u 


272,773 

0.2% 


Abstract Data Types 

567 

567 

364 

1949 

2.3% 






8457 

1.1% 


Rate Monotonic 

223 

223 

342 

1079 

3.0% 


Analysis 




6400 

1.3% 


Software Cost 

273 

273 

394 

1134 

3.0% 


Models 




7131 

1.2% 


Software Work 

36 

36 

63 

134 

8.6% 


Breakdown 

Structures 




567 

4.2% 


Software 

257 

257 

222 

1041 

3.1% 


Technology Transfer 




6996 

1.2% 



Table IV-1 Technologies Examined 


Ada provided the basis for a number of experiments. In the Ada experiment 1, 
sample, there were 3,385 source records (messages), with 1,460 tenns (the alphabet size) 
measured in 13,554 experimental data instances for the calculation of the actual entropy 
contribution of a message. The model predictions are of entropy at the macro level, and 
the microstates of the terms arrangements is the basis of computing the sample 
distribution the error and confidence interval. The result is the error and confidence 

intervals are VERY small when the sample size is in the thousands 
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The experiment indicated with “N”, studied the effects of messages as complete 
records. In the Ada “n_singles ” study, single primitive terms were considered the 
messages. In the last Ada study, “n”, the combination of sets of terms in a record 
identifier were considered primitive messages. 

Once it was established that n (the extensive variable) was the approach that best 
represented the intensive variable temperature, then all of the technologies were studied 
with distributions of sets of sets. 

Recall that a positive Lyapunov exponent indicates chaos, not convergence. So 
we could have technologies which result in a Lyapunov exponent that is positive. For 
those cases, we need to know the initial data for a time step 0, with an accuracy of N+k 
places in order to detennine a result with an accuracy of N digits after k iterations. 

A. EXPERIMENT 1 (SENSITIVITY TO ANOMOLOUS DATA) 

1. TechTx Basic Entropy Macro Level Data and Analysis 

For the validation of the TechTx Basic Entropy model, basic curve fitting is 
performed. The least squares method was used. Comparison of the sum of the residuals 
squared gives us a R value to determine goodness of fit. 

A discussion follows on the implications of the baseline model and the TechTx 
Basic Entropy model. The predictive strengths of each are presented. The other 
technology areas studied are then compared to the baseline model using the TechTx Basic 
Entropy. 

2. Data Source and Analysis Tools 

All of the data came from the IEEE INSPEC database, for Physics, Electronics 
and Computing. It is a well-indexed database and has comprehensive coverage of the 
field. This is a database, which corresponds to the three print publications: Physics 
Abstracts, Electrical and Electronic Abstracts, and Computer and Control Abstracts. 
http://library.dialog.com/bluesheets/htmla/bl0004.html This family of science abstracts 
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began publication in 1898. There are approximately 4,100 journals and serials scanned, 
of which 750 are abstracted, cover to cover. This constitutes 82% of the database, 
including 6% from conference papers reported in journals. Another 16% come from 
conference proceedings. Books, reports, and dissertations are also covered. 

The IEEE INSPEC database is an appropriate source for messages on a 
technology in the software-engineering field. Should a technology be desired outside of 
this area, another set of databases should be explored. INSPEC does use a controlled 
vocabulary from the INSPEC thesaurus. A single classification scheme is used for all 
records from 1969 to present. The IEEE INSPEC database updates are done 50 times per 
year. Each update averages about 6000 records. 

The data was collected from the INSPEC database using the Naval Postgraduate 
School access to the Cambridge Scientific Abstract version of the INSPEC database from 
June through September 2000. To reproduce the data, any search engine that searches the 
INSPEC database should suffice. The raw data was processed by the US Army’s open 
source intelligence engine TAOS. The TAOS system is available to the Army Tank 
Automotive Research, Development and Engineering Center. The TAOS version used is 
identical to Tech OASIS version 2.3a. Information on this system can be found on the 
VantagePoint web site at www.searchtech.com . For additional information on the TAOS 
system, see (Watts 2000, Porter 2000, Porter 2000a, Porter 200b, Porter 2000c). Contact 
the NextGenSoftware@TACOM.Anny.mil and ask for the program manager. The point 
of contact for TAOS is Mr. Robert Watts. 

This engine takes records, given from a simple Regular Expression application 
from a set of tagged, parses the data and identifies any field indicated in the regular 
expression schema. TAOS 23 was used to identify duplicate records, i.e. messages, in the 
context of this research. 

B. ADA 


23 <http://www.searchtech.com>. The current version of Tech OASIS is 2.3a. VantagePoint and 
TAOS, (Tech OASIS) are identical through the current version. 
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In experiment 1, 22 years of Ada data was drawn from the period of 1979 to 2001. 
This data set incorporated a deliberate wobble in tenns of noisy data in order to reflect 
real world effects. The data for 2001 was left out of the analysis since it was not a 
complete year. The entropy data contains 32,076 data points based on 1458 tenns, for 
3385 non-duplicated records. 

In experiment 2, over the same period, there is 34,862 data points based on 1583 
tenns, with 17,592 instances, in a total of 4249 non duplicated records. This resulted in 
117,637 state points for the sample distribution. The comparison of the message - 
counting method and TechTx Basic Entropy model is done with the experiment 1 data set. 

The equations in Chapter III can be easily adjusted to represent a portfolio of 
technologies; however, this is beyond the scope of this dissertation. The detailed data for 
Ada to compute the entropy is shown in the appendix. We need to understand that the 
curves we get for the data are local to the technology and vocabulary of the technology in 
question. We would not expect to see the same coefficients, exponents or intercepts for 
another technology. 

1. Data and Method to Retrieve and Reduce Data 

The data collected for Ada is typical of the method used for all of the other data 
sets and will be explained here. 

In the case of Ada, only the term “Ada” was searched for anywhere in the 
database record. All of the records referenced were retrieved. There were 3385 unique 
records found in the database from 1969. Although the search was not refined by 
limiting the terms searched for, the first record that included the term “Ada” and dealt 
with “software” was in 1979. Prior to 1979, Ada referred to a number of different 
acronyms unrelated to the technology in question. A good search run by an infonnation 
specialist or special librarian would throw out these “false drops”. This stresses the 
importance of a good search strategy and identifying only relevant messages on the 
technology to be assessed. 
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The raw data was examined for duplicate records. These duplicates were 
removed. A Regular Expression application parsed the data, which was delimited by 
field tags and easily identifiable sub field delimiters and put into a flat file fonnat. This 
format was readily examined by TAOS. The records were collected into time step bins. 
These bins were aggregated in annual, monthly and weekly time steps. 

The first review of the annual data was done by publication year (PY). This is 
field assigned by INSPEC and entered by the indexers from the source document. Initial 
studies were all done using the publication year as the time step. From the confidence 
discussion in the next section, publication year time step bins are felt to be suitable for 
general-purpose use of the model approach described. For experimental purposes, more 
refined studies were done. During the later stage of experimentation, time step bins were 
identified from INSPEC accession number ranges. These ranges were determined from 
the INSPEC database by limiting the search. Here is an example search statement on the 
Dialog®, (www.dialog.com ) information retrieval system. 

S ud=199701 wl 

• This gives you an update of the number of records added in 
"1997" during month "01"and week 1 "wl". 

• To see the accession numbers, you can display the first and last 
accession number. 

d si/I/Total = 

• This statement will give you the least recent (first one added) 
accession number in the set. "si" is the set number, "1" is the 
code. 

• To display the accession number and "Total" is the total number 
of records in the set. 


d sl/1/1 = 

• This statement will give you the most recent (last one added) 
accession number in the set. "si" is the set number, "1" is the 
code to display the accession number and the second "1" is the 
first record in the set. 

This approach was performed to develop annual time step bins using accession 
number ranges. While it is clear that ranges of accession numbers for monthly and 
weekly time periods can be precisely identified, for a time step, it is a most time 

consuming process. Therefore, an approximation was made for monthly or weekly 
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intervals using accession (AN) number ranges. In each case, a the annual accession 
number ranges were divided into 50 and 12 equal parts for week and month time steps 
respectively. 

2. Interpretations of Data (Ada) 

During experiment 1, the observed data had some wobble around 1985. This was 
the result of no records being captured from INSPEC for that year during our initial 
search. This wobble provided some interesting insights. The initial reaction was to 
throw the data out and start over. While we did collect more data, the wobble enabled 
visibility into the effects on the models caused by gaps in data. The wobble also seemed 
to represent the type of information that a practitioner would get as well, when not in the 
sterile conditions required by an experiment. While in a production system, we might 
want to “take what you get”, for the purposes of sorting out the model and early usage, 
the data was closely examined. For each year without data, we averaged the data for the 
three years prior and two years after, as an estimate for the value of 1985. These 
adjustments have the same effect on all of the model studies, as you will see. 

In experiment 2, pure data was collected to better refine the validation of the 
TechTx Basic Entropy model. 


3. Traditional Model - Message-Counting 

Figure IV-1 illustrates the Ada data using the traditional method of Rogers 
(Rogers 1983, 1995), generally used for diffusion of innovations. This is also the method 
used by the researchers at Carnegie Mellon University, in Shaw’s briefing on software 
architectures (Shaw 2001). The regression on the message count vs. time for Ada, using 
least squares fit, achieves an R" of .97. While this is usually considered a reasonable R" 
value, we shall see that the entropy approach also has a good fit and used together both 
provide predictive capability. Figure IV-2 shows the ability to project the future with 5 
years and 10 years of data using the linear fonn fitted to the points marked with triangles. 
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For purposes of a cursory comparison of the two approaches, the entropy data is 
shown as circles and plotted against the secondary Y-axis. The secondary Y-axis scale 
was carefully chosen to have the two series’ final years of data to nearly coincide. This 
pennits a gloss-over discussion of the shape and influence of the data. This gloss seems 
to suggest that both models are subject to the same data anomalies. 

A casual examination of the data seems to suggest an “S” type curve. It slowly 
starts, ramps up to nearly linear in the center section of the data and starts to tail off at the 
end. The tail off at the end could be explained by the fact that there is some lag in the 
publication, indexing, and database update process. For example, the last full year may 
not have all of the records posted from the prior year when the data was collected in June. 
Although 2001 is not in the data set, this most recent year that the data could have a PY 
date certainly could not have all of the final year data posted. Study of this lag could be 
made to better explain the message-counting shape more thoroughly. The same effects 
for process lag influence the entropy data. Both the message count and entropy data are 
influenced by the wobble in the data around 1985. In both cases, it tends to propagate 
into the future, since both models use cumulative infonnation. Using the cumulative 
approach seems appropriate since the messages are persistent and available to all of the 
future researchers to examine. 

One seemingly problematic area with these message-counting linear models is 
that the Y intercept is a negative number. This implies, that at time zero, there is a 
negative number of messages. While that is not possible, it could be suggesting there is 
some prior experience that will soon break loose. Prior learning in the entropy model 
seems to be more visible in that the entropy starts out driven by terms that existed prior to 
the subject technology introduction. This can be seen in Table IV-2. We know these 
were preexisting technology terms in Ada’s heritage. A discussion of the fit for the 
TechTx Basic Entropy model follows the message-counting traditional model. 
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1980 Ada 

1980 software-portability 

1981 software-engineering 
1981 programming 

1981 military-computing 
1981 standards 

1981 operating-systems-computers 
1981 multiprocessing-programs 
1981 parallel-processing 
1981 synchronisation 
1981 computer-architecture 
1981 multiprocessing-systems 
1981 microprocessor-chips 
1981 microcomputers 
1981 military-equipment 
1981 virtual-storage 
1981 microprogramming 

Table IV-2 Terms Identified in the Entropy Model for 1980, 81 (Years 2,3) 


Traditional Method -- Count the Messages 



Years 


Figure IV-1 Traditional Model - Message-Counting 
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Traditional Method -- Count the Messages 



Figure IV-2 Traditional Model — Projections using message-counting Approach 

4. Improved TechTx Method - Basic Entropy Model 

The entropy approach is driven by the terms contained in the messages. The data 
and trends in the distribution of the top 100 message terms for the Ada example is shown 
in Figure IV-3. This figure shows the cum entropy of the top 100 terms used in the 
messages distributed over time, with the start-time step of the data set at the back wall. 
The terms were sorted by their instance frequency. This loosely related to the 
information they contribute to the message pool over the period examined. 

The terms (SI-SI00) are sorted by the highest frequency of terms toward the left 
and lower frequency occurrences to the right. This is for the entire data set over the 22- 
year period. Spikes seen farther off to the right indicate early-use terms when the entire 
vocabulary was relatively lean. They quickly diminish in importance as time marches 
forward. It is interesting to look at the tabular entropy values from a 30,000 foot level. 
The Ada data is shown in the appendix to enable this view. The early terms seem to 
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show pedigree, e.g. Pascal. A late arriving term shows up with a lot of white space. If it 
is a melding (grafting) on another technology area that is rapidly growing, then the term 
arrives and stays in the higher frequency ranges, e.g. object orientation. 

The first term is Ada, as would be expected, with a cum entropy contribution of 
.4276. This represents a gentle drop from a high entropy of .496. This decline is to be 
expected as more terms are added, and the search tenn influence is diluted. It would be 
surprising if the search term in question lost its position at the number one slot over the 
evaluation period. This would imply that another, or likely many other, terms are in 
ascendance relative to the search term. 


Ada Entrpy (Top 100 terms) 



Figure IV-3 Top 100 Terms 

A review of the top 50 terms also yields an expected result. The strengths of Ada 
are most often cited as seen in Table IV-3. Close examination of the data using the 
column labeled “slope” seems to provide insight into whether a related term is on the rise. 
This indicates that that term’s contribution is adding to the technology, or in this case 
declining relative to Ada. 
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Order 

Instances 


Average 

Slope 

Influence 

Max Entropy 

1 

2194 

Ada 

0.418 

0.283 

+ 

0.497 

2 

368 

real-time-systems 

0.089 

0.093 

+ 

0.148 

3 

355 

software-enqineerinq 

0.195 

0.088 

+ 

0.304 

4 

351 

obiect-oriented-proqramm 

0.066 

0.095 

- 

0.139 

5 

273 

proqram-compilers 

0.137 

0.074 

+ 

0.217 

6 

229 

programming 

0.119 

0.066 

+ 

0.176 

7 

221 

object-oriented-lanquaqes 

0.023 

0.072 

- 

0.097 

8 

209 

software-tools 

0.059 

0.060 

+ 

0.104 

9 

204 

aerospace-computinq 

0.054 

0.060 

+ 

0.098 

10 

199 

military-computinq 

0.090 

0.058 

+ 

0.114 

11 

182 

formal-specification 

0.041 

0.056 

+ 

0.085 

12 

178 

parallel-programming 

0.048 

0.055 

+ 

0.086 

13 

174 

software-reusability 

0.045 

0.054 

+ 

0.088 

14 

168 

computer-science-educati 

0.062 

0.055 

_ 

0.104 

15 

148 

proqramminq-environmen 

0.055 

0.046 

+ 

0.099 

16 

137 

distributed-processinq 

0.061 

0.044 

+ 

0.091 

17 

133 

hiqh-level-lanquaqes 

0.121 

0.042 

+ 

0.251 

18 

128 

data-structures 

0.070 

0.041 

+ 

0.110 

19 

123 

proqram-testinq 

0.038 

0.041 

+ 

0.062 

20 

118 

diqital-simulation 

0.084 

0.039 

+ 

0.173 

21 

113 

software 

0.040 

0.038 

- 

0.065 

22 

92 

Ada-listinqs 

0.035 

0.032 

+ 

0.070 

23 

89 

C-lanquaqe 

0.021 

0.033 

- 

0.048 

24 

88 

proqram-verification 

0.025 

0.032 

- 

0.048 

25 

86 

object-oriented 

0.014 

0.034 

- 

0.046 

26 

79 

software-portability 

0.060 

0.029 

+ 

0.178 

27 

72 

object-oriented-methods 

0.015 

0.028 

+ 

0.042 

28 

72 

software-maintenance 

0.018 

0.027 

+ 

0.044 

29 

69 

software-reliability 

0.023 

0.026 

+ 

0.045 

30 

68 

fault-tolerant-computinq 

0.031 

0.025 

+ 

0.052 

31 

66 

standards 

0.048 

0.024 

+ 

0.108 

32 

62 

operatinq-systems-compu 

0.055 

0.023 

+ 

0.108 

33 

60 

automatic-proqramminq 

0.031 

0.023 

+ 

0.045 

34 

60 

educational-courses 

0.022 

0.024 

- 

0.041 

35 

59 

abstract-data-types 

0.010 

0.023 

+ 

0.034 

36 

58 

multiprogramming 

0.023 

0.023 

+ 

0.039 

37 

58 

scheduling 

0.021 

0.021 

+ 

0.039 

38 

56 

safety-critical-software 

0.006 

0.024 

- 

0.032 

39 

55 

multiprocessing-programs 

0.038 

0.021 

- 

0.108 

40 

54 

inheritance 

0.009 

0.023 

- 

0.032 

41 

50 

program-debugging 

0.030 

0.020 

+ 

0.051 

42 

48 

software-metrics 

0.014 

0.019 

+ 

0.033 

43 

46 

qraphical-user-interfaces 

0.012 

0.019 

+ 

0.030 

44 

45 

Pascal 

0.077 

0.017 

+ 

0.186 

45 

44 

proqram-interpreters 

0.023 

0.018 

+ 

0.045 

46 

44 

concurrency-control 

0.011 

0.019 

+ 

0.027 

47 

43 

command-and-control-sys 

0.020 

0.018 

+ 

0.035 

48 

42 

knowledge-based-system; 

0.018 

0.016 

+ 

0.039 

49 

41 

expert-systems 

0.026 

0.016 

+ 

0.058 

50 

41 

systems-analysis 

0.032 

0.016 

+ 

0.058 


Table IV-3 Top 50 Terms Based on Cum Entropy 
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The slope is the comparison of two rates of change: the rate of change for the tenn 
compared to the rate of change of the technology, in this case Ada. The following is the 
equation for the “slope”. 


dope- d{Term ^)' dt 

d(Tech_ Term ave ) / dt 


(LastYear - Average) Term 
(LastYear - Average) Tech Term 


(4.1) 


The LastYear is the last full year of the data set. The MaxEntropy column is the 
peak value of the tenn’s contribution to the overall entropy of the time step. The 
“Influence” column is detennined by whether the last full year of the data is greater than 
the value at some arbitrary, but recent history value ( Entropyi ast year -Entropyiastyear- 4 }- In 
this case, that is four years prior. If the technology in question did fall off of the top slot, 
the terms that are driving the decent would be obvious from both the “Influence” column 
and the “slope”. It would be a clear sign that relative to these ascendant terms, the study 
technology was declining. It might also suggest that to be rejuvenated, some of the facets 
represented in the ascendant technology should be evaluated to be grafted into the study 
technology. For example, if Java were maturing faster than Ada, which we can see is 
happening from the macro data in the next section, the common features (terms) of the 
technologies could be capitalized on. In fact, what has been observed in the case of Ada 
and Java, is that combining the technologies, gives the best of both worlds 24 . 

From this discussion, it is obvious that the TechTx Basic Entropy model provides 
significantly more insight than the message-counting model. Both are communication 
diffusion models, but the entropy approach provides more insight with only slightly more 
data parsing. 


24 This is based on discussions with Tucker Taft. Taft easily can be considered the chief architect of 
Ada 95. He has grafted Ada and Java at the byte code, virtual machine level. The result was a benefit to 
both languages. 
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Figure IV-4 TechTx Basic Entropy Model Predictive Ability Experiment 1 


Figure IV-4 illustrates the predictive capability of the basic entropy model. It is 
interesting to note that the entropy change (ASh) vs A time is perfonning as one would 
expect. The rate of change is decreasing. This suggests stabilization. From this 
indicator, stabilization could mean two things. One is that the vocabulary and use of the 
technology has settled down. The other is that the pervasiveness once enjoyed in the 
early period is dissipated by other technologies. This has two effects. Ada, by definition, 
is affecting the other technologies, and they in turn are affecting Ada. This is an example 
of both the dissipative and integrative aspects of the bakers transformation discussed in 
Chapter II. Since we have knowledge about this technology (Ada), it is likely that both 
are occurring. 

The curves for the comparison experiment 1, for Ada, can be seen in Figure IV-4. 
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Message Counting and Entropy Approach 



Jan 31 2002 M Saboe 24 

Systems Dynamics Society 
20 th International Conference 

Figure IV-5 Entropy and messages N over time 

One of the ways we determined that the fit for messages vs. time was linear 
follow. When we lit the data by taking all of the data and fitting the curves, then shifting 
on year (12 time steps in this case) and fitting the curve the data consistently showed that 
the R 2 extremely well correlated. For the message counting approach, we had an average 
R 2 of 0.985 for a linear function. For the entropy, a power curve fit yield, on average R 2 
of 0.962. This is seen in Figure IV-5. 

You will notice that there are several “flat” spots in Figure IV-5. This does not 
detract from the development of relationships of the various extensive and intensive 
variables. Occasionally, there are gaps in data. Regardless, the curve fit still is quite 
good. In the real world, there will also be gaps in data. Later, these data for the flat spots 
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are needed in order to develop difference, change, in the state properties per time step. 
Those data points are approximated by the curve fit. Formally, this is called regression 
imputation or conditional mean imputation approach. Using regression analysis, ordinary 
least squares, we modeled the missing data by predicting the missing data from data 
observed. This is consistent with methods for small data sets (Myrtveit 2001). 

A summary of the analysis is given in Figure IV-6. While we tried to fit other 
curves to the data, these clearly came out superior in the data examined. 




R-Squared Values for Ada, World 





252 total steps, 21 years. 

Number of Publications 

Entropy 


Year 

Startinq Step 

R squared 

Equation 

R squared 

Equation 

1979 

3 

0.9867 

y=18.928x + -50 

0.8665 

y=0.0105x + 5 

1980 

13 

0.9901 

y=19.366x + -38 

0.925 

y=0.0094x + 5 

1981 

25 

0.9919 

y=19.774x + -21 

0.9726 

y=0.0085x + 5 

1982 

37 

0.9926 

y=20.108x + -20 

0.9852 

y=0.008x + 5 

1983 

49 

0.9926 

y=20.377x + 18 

0.9886 

y=0.0079x + 6 

1984 

61 

0.9919 

y=20.587x + 40 

0.9883 

y=0.008x + 6 

1985 

73 

0.9914 

y=20.852x + 61 

0.9879 

y=0.0081x + 6 

1986 

85 

0.9907 

y=21.126x + 83 

0.986 

y=0.0079x + 6 

1987 

97 

0.9898 

y=21.429x + 10 

0.9829 

y=0.0079x + 6 

1988 

109 

0.9899 

y=21.903x + 12 

0.9814 

y=0.008x + 6 

1989 

121 

0.9875 

y=21.741x + 15 

0.978 

y=0.0082x + 6 

1991 

133 

0.9839 

y=21.71x + 18 

0.9708 

y=0.0081x + 6 

1992 

145 

0.9791 

y=21,329x + 21 

0.9617 

y=0.0079x + 6 

1993 

157 

0.9708 

y=20.983x + 23 

0.9459 

y=0.0077x + 6 

1994 

169 

0.9665 

y=19.624x + 27 

0.9335 

y=0.007x + 7 

1995 

181 

0.9747 

y=17.472x + 30 

0.9521 

y=0.0058x + 7 

1996 

193 

0.9876 

y=15.381x + 33 

0.9762 

y=0.0049x + 7 

1997 

205 

0.9918 

y=14.301 x + 35 

0.9901 

y=0.0046x + 7 

1998 

217 

0.988 

y=13.755x + 37 

0.9864 

y=0.0048x + 7 

1999 

229 

0.9646 

y=13.742x + 38 

0.9605 

y=0.0045x + 7 

2000 

241 

0.9891 

y=7.2797x + 41 

0.8915 

y=0.0035x + 7 



0.985295238 


0.962433333 



Figure IV-6 Curve fit for Messages and Entropy with various data subsets 

For experiment 1, we get a power law curve fit for Entropy ( S H ) using the 
complete data set, we get 


S H =4.183f 185 


(4.2) 
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For the power-law curve fit for Entropy (S H ) using 5 years, we get 

S H = 4.34 f 157 (4.3) 

For the power-law curve fit for Entropy (.S’//) using 10 years, we get 

S* =4.35f 153 (4.4) 

The TechTx Basic Entropy model error for all of the predictions is in a range 
(from -8% to +5%), when we realize that we are trying to predict the future. Note that 
the model tends to err on the conservative side, i.e. all of the out year predicted errors are 
negative. This conservative predication is due to the wobble in the data in the 1985-1986 
range. This wobble reverberated in the out years and drove the out year predicted values 
down. 

Other studies were conducted to detennine whether other forms of the regression 
curves would fit better. The forms evaluated were linear, power, exponential, 
logarithmic, and time series. The logarithmic faired favorably with the power form for 
the entropy model, for long ranges of data, but poorly when trying to fit limited data 
points and predict the future. The time series and polynomial obviously can fit the data 
very precisely. The time series lacks predictive value beyond the number of periods in 
the moving average. The polynomial can very accurately match the actual data points, 
but predicts with 5 and 10 years of data very poorly. 

The balance of this section for the eight technologies, we use the pure, experiment 
2, data set. The Ada data and resulting curves for experiment 2 is shown in Figure IV-7. 
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Experiment 2 

Cumulative Entropy vs. Year 
Ada 1583 Terms, 18006 Instances, 4249 Messages 



Figure IV-7 Ada TechTx Basic Entropy Experiment 2 

5. Temperature from a Grand Canonical - Partition Function 

Let’s look at temperature of the system and real data in yet another way. The idea 
of developing the maximum entropy can get a bit obscured with the empirical data, 
because of the fact that we are always adding terms and vocabulary. We shall evaluate 
the temperature-maximum entropy relationship of the distribution function by looking at 
the maximum entropy for a number of small alphabets first. Chapter III, indicated the 
number of potential microstates at each q-level for an alphabet of 128 terms and the 
maximum entropy for small alphabets of 128, 96, 64, 32, 16. We observed that the 
maximum entropy decreases as the alphabet increases. In a sense, this shows that as the 
alphabet size increases, the maximum q-level entropy decreases. So if terms are added to 
the vocabulary every time step, there is damping of the maximum entropy curve. The 
early (lower) q-levels will be filled much faster than the higher q-levels give smaller and 
smaller contribution to the entropy pool as the vocabulary increases as well. A way to 
think of this is filling a bath tub with hot, energetic molecules, but at the same time more 
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and more cool, low energetic, molecules are also being added. There comes a point 
where the hot particles are less of the population, and there is weighting to the lower q- 
levels. 

Recognizing that in a system that has in influx of tenns being discovered (we 
pennitted them to exist in the alphabet when they were simply a potential concept set of 
terms), we will see a bias toward the lower q-levels. In a way, we may want to view this 
as a system that is in contact with a large reservoir, at ambient. Finally, the system is at 
an equilibrium with the reservoir, but since there are more states that are available in the 
technology system, it still attracts messages. 

However, if we draw an appropriate control volume, around the terms in use, this 
indeed will approach maximum entropy. Simply start with a small number of terms, 4, 8, 
16, 32, etc, and we can see that all of the states are soon occupied. Here we illustrate an 
alphabet of 4 terms, the top four tenns in what will turn out to be well over 1000 terms in 
the technology’s alphabet. However, for the purpose of validation and illustration Figure 
IV-8 meets the bill. 
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Max Entropy 
Tod 4 terms Measured 



Figure IV-8 Max Entropy in a Small Alphabet (measured) 


The upper curve indicated as a red dashed line with a A marker represents the 
maximum entropy that an alphabet of four terms can have in a q-level. In the case of 4 
terms, we get q-level =2 to have a maximum number of 6 microstates, out of a total 
multiplicity of 16 possible configurations. For q-levels 0 through 4, the microstates are 
1,4,6,4,1. 

The next black dashed line with • marker represents the measured entropy for the 
most mature time step (yearly) in the technology’s 4 tenn vocabulary. This is shown in 
Table IV-5. The first row is and indication of the q-level, where 0 is indicating a {}, null 

set. There is always one null set. The second row is the primitive message count of 

- 193 - 































Ttftip d('j"JM IS JOM; 


subset combinations, with the sum at the far right. The third row is the entropy in a q- 
level. The last row indicates the q-level maximum entropy that can be achieved i.e. equi- 
probability of the sets. 


q-level 

0 

1 

2 

3 

4 

sum 

2000 

1 

4204 

1352 

166 

8 

7731 

entropy_q-level 

0.001671 

0.4779 

0.439922 

0.118985 

0.010261 

1.048767 

Max Entropy 

0.25 

0.5 

0.530639 

0.5 

0.25 

2.030639 


Table IV-5 Measured Entropy, microstates, and maximum entropy 

This shows us that the entropy does increases and starts to approach the 
maximum. Figure IV-9 shows how the temperature term computed from this small 
alphabet increases over time steps. 

Temperature = / (Max Entropy, Entropy^) 

Top 4 terms Measured 

Temp = f(max entropy) 



ada_qlevels_term_month_top4_max_entropy.xls 

Figure IV-9 Small Alphabet (4 terms) from Ada, Temperature term vs time 
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6. Validating the Partition Function 

We validate the partition function for the model using Ada. The microstates of 
the alphabet and vocabulary for Ada are related to the primitive terms n in various q- 
levels. A technology example of microstates, q-levels and entropy is shown in Figure 
IV-10. The number of microstates or the set entities (primitive messages) in the various 
q-level (x-axis) are shown on the left-hand y-axis. The cumulative entropy, that is 
computing the entropy of each q-level, is shown on the right y-axis. 


Ada ntuple sets 

Entities, q levels, Entropy (Cummulative) 



Feb 2002 M Saboe 

Ph.D. Defense 2002 


v 

£ 

S 

E 


71 


Figure IV-10 n microstates distribution to q-levels, and Cumulative Entropy 
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Figure IV-11 q-level distribution, actual and modeled, probability and the Weibull 

distribution. 

We validate the computation of a temperature using the partition function using 
Ada. In Figure IV-11, on the left-hand y-axis, we show the number of microstates 
populated by sets of terms. Each pair of bars represents a calculated microstates 
occupancy and actual q-level primitive message occupancy count. The calculated value 
is the bar the left of the pair. Tracing the bars is a probability of being found in the q- 
level. The probability is associated with the secondary y-axis on the right. We also can 
see the cumulative probability distribution function, which is the upper curve. The curve 
we can observe approaches 1, with each q-level having a smaller and smaller probability 
of being entered. The curve shows that there is over a 75% chance of being in q-levels 1 
through 4. This distribution can be modeled as a Botzmann-Gibbs distribution function 
using (3.98). For the sample in Figure IV-11, we have 
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0=1.01068 «1 


7=0 

kT = 108,313 

2 

We end up with a R =.999397, a pretty good correlation. Refinements to the 
model will enable determination of the temperature in °degrees Saboe. 


Temperature Sensitivity to Granularity 

Temperature vs Entropy for Technology (Ada) 
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Temp = dn/dS (single terms Not Legendre) 
Temp = dN/dS (records NOT Terms) 



Microstate Multiplicity of term sets - J 
from partition function 



q_level_T em p_entropy 


Figure IV-12 Temperature Sensitivity to Granularity. 

Figure IV-12 shows the sensitivity of the temperature term to abstraction. 
Chapter III discussed the approach to counting the conserved extensive variable. The 
lower blue curve computes the temperature as a function for the trend line approximation 
of mean records N being processed. Similarly, the middle, green curve, relates to single 
term distribution of the trend of primitive messages. This is a finer granularity than a 
record, but does not yet meet the desired set of complete conditions to describe 

temperature. The upper curve, red curve, illustrates the actual data points, not the mean 
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of the trend line. This is the temperature tenn of the data when, the sets of sets of terms 
which we observed were allocated to q-levels. Even these are yet an approximation since 
all of the terms in the bin were given equal probabilities. 

We also note that the exponent a also is greater than one. This is as suggested by 
Prigogine for a social system. We might restate Prigogine’s comment more generally as 
“for a non-physical system the exponent a on the conserved property interactions might 
be greater than one.” 

Figure IV-13 shows that the data using the partition function does in fact perfonn 
as the theory in Chapter III predicted. As time passes, the technology heats up and 
consumes free energy states and also heats up due to the addition, across the control 
boundary of additional terms as answers !!! that previously were questions ???. We can 
observe that with time the trend is obvious and very predictable. The confidence level on 

this data is on the order of 1/ ^118,141 = ±0.3% . Due to the construction of the model, 
the data will always have a very tight confidence limit. Even as few as 1041 primitive 
terms will yield 1 / Vi 041 =±3.1%. 
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Figure IV-13 Ada Partition function validation 


The following list of technologies were evaluated. The entropy and linear model curves 
are compared in Figure IV-14. A thumbnail for each technology is shown below. 
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Entropy S k (Bits) 


Experiment 2 

Cumulative Entropy vs. Year 

Java 2813 Terms, 28907 Instances, 5330 Messages, 6 Years 



Figure IV-14 Java and Ada Comparison Entropy S/ ( vs k (time step = years) 


• Java 
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Java relationships 


Java n, primitive m^ages vs time step 



Java Temp based on local 

TOO 





Java Entropy vs time step 



Java Distribution of Messages by q-levels 



Mar 2002 


M Saboe 

Ph.D. Defense 2002 


Java Beta and “Mindshare” 



Figure IV-15 Java Relationships 


For an early Ada example seen in Figure IV-16, we can observe that both curves, 
the curve for the Lyapunov exponent and Shannon’s entropy have the same power law 
form. By observation, we see both of the entropy measures as a function of time step. 
Sh„ the information theory entropy measure is on the left y-axis, and the Sb which comes 
from the eigenvalue of the micro control model (hence in the range of 0 to 1), is on the 
right hand y-axis. The scales were adjusted to easily see that both curves are of the same 
form. In addition, we can see for this early data set, that the R~ values are reasonable, at 
0.968 for system level entropy and 0.96 for the bakers transfonnation j. We can see that 
as the system entropy stabilizes, the eigenvalue of the feedback control dynamical system 
is also stabilizing. 


Initially, to determine the form of the functions, the average value /3~10% was 
used. This was done by iterative guesses of a fixed f. This approximation of /3 was used 
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to satisfy the macroscopic rate of change of entropy. This suggests that we have the right 
form of the dynamical system control model matched to the macroscopic system model. 
This also suggests that the model does approximate the observed conditions. 

Entropy S H and Dynamical System j 


Entropy (SB) f(j, B) 



Figure IV-16 Macro Equilibrium Sh and Eigenvalue j Stabilization 

From (3.77), which develops Shannon entropy now in terms of jk, which we know 
from (3.70) is a function of /3. At this point /3 was adjusted until the entropy (eigenvalue) 
of the discrete model matched the macroscopic entropy of the information theoretic 
model. In each time step, the tolerance on the two methods of computing the entropy 
were matched to within 0.1%. This is seen in Figure IV-17. The upper curves (the two 
are superimposed) represent entropy converging at the same timestamps for the system. 
The lower curve represents /?, which changes over time. The secondary v-axis, on the 
right gives f5 as a percentage. 


-202 - 














Entropy (Lyapunov, (Beta)) 

Entropy and Beta_k 



Feb 2002 


M Saboe 

Ph.D. Defense 2002 


99 


Figure IV-17 Solving for [3 to converge Sb and Sh 


At this point, we are considering the “community” a large node. In the real world, 
the community is partitioned into a volume of performing nodes, and these nodes have 
different performance rates. However, at the community node level, we can not 
distinguish what the contribution is for mind share or learning. It would be useful to 
tease apart the contribution that is due to mind share and that which is due to learning. 

A quick look can be obtained by allocating each author’s contribution. This is 
done by dividing [3 by the number of authors and determining the <(3 z k >, average 
feedback messages request. This result is shown in Figure IV-18. The dashed, red, curve 
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represents the allocation of [5 to each author, using the left y- axis. The right hand y-axis, 
gives the accumulation of the number of authors, or “mind share”, which increases over 
time. 
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Figure IV-18 /? Feedback requested from persistent messages, allocated per 

author. 

We can see that [kk is decreasing over time. /? is decreasing with the number of 
total messages, or tasks performed. Learning appears to be occurring, or the messages 
are more easily understood. Understanding the message and immediately being able to 
act on it can be considered the result of learning, or improved packaging of the message. 
Discussing the various learning curves, is beyond the scope of this research. However, in 
Chapter VI, future research directions are suggested that may relate a form of the learning 
curve to entropy. 
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V. SUMMARY OF CONTRIBUTIONS 


The ability to bridge these two previously disconnected views of a physical and 
non- physical world conveniently provides powerful analytical tools to the software 
engineer. This is a nontrivial contribution to the software engineering community; we 
can put methods in the hands of software engineers that can be readily grasped by the 
mechanical, electrical, or communication engineer or anyone who has had some basic 
physics. This reduces the barriers to use by lowering the effort required to unpack, 
decipher and understand the “communication protocol” for the user community for this 
technology. In this technology case, the experiment was technology transfer. Since we 
used a communication by a set of symbols that were canonically related, and a method 
that is already common to the engineer and scientist, we have increased the available, 
high q-level microstates, which contain powerful concepts. 

This research tied the three main components together in the TechTx Entropy 
Feedback model. These are infonnation theory, statistical mechanics, with the 
dynamical control model of the technology transfer model. In a relatively comfortable 
way, we have tied in Rogers Innovation (the software information base element), his 
communication network of exchanges of infonnation reducing uncertainty and improving 
the mutual infonnation of the sender, receiver and consumer. We also address the time 
aspect. Recall the baker transformation iterations of folding, stretching, rotating and 
translating represented a mathematicians view of time. In order to address time and all 
of the other observed aspects of technology evolution, we use the information theory 
entropy and the chaos control model. This had a critical aspect that related the two 
views of entropy, which took time out of the picture in terms of clock time and related 
time to mixing and the bakers transfonnation. 

The major contribution is the development of a series of equations of state that 
define evolutionary models. The key element was the approach to provide an engineer 
with a relationship of temperature, entropy and a conserved property. Temperature is 
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fundamental infonnation units and referred to as 0 Degrees Saboe. Temperature is 
significant because it relates the maximum complexity of a system to the current 
complexity. This is a proven metric that can be applied in many places to software 
engineering, e.g. software complexity. A direct relationship can be easily made to 
Halstead’s metric which is familiar to software engineers. This in turn has been related to 
the rate humans are capable of making decisions between two choice, e.g. alphabet sets 
of sets of operator and operands, operators and edges, operators and flows. 

These equations are enable the development of temperature of a process in four 

ways. 

1. From (3.25) where we look at the slope of the change of microstates to the 
change in the conserved property of two interacting subsystems. 


1 AS, 


An 


(3.25) 


2. The second approach looks at the dynamical system model. Here a pair of 
dynamical equations (3.80), represents the discrete interactions and provided a 
relationship to temperature. We can partitioned down the macroscopic world to represent 
trajectories of microcanonical ensembles and their probability distributions at the node 
interaction level. 


*^*+1 S k + m N N k 


(3.78) 


^t+i N k +m s S k 


(3.79) 


jj+i S k _ tn lN N k _ 


1 


1 AS 


N k+1 -N t m,S, m s S k T AN 


(3.80) 


l S u k 


S k 
™N N k 


3. The third approach is through available occupancy microstates related to 
maximum entropy and the partition function, (3.88). 

P(q i ) = Ce q ' /kT Boltzmann-Gibbs (3.87) 


If we sum all of the q-levels, we get T>, given by 
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Partition function 


(3.88) 


= X 

ieily 

and the more general distribution in the form of the Weibull function, 

—\ SiPL] 

P(q-) = e p ’ a Weibull Distribution (3.98) 

4. Closely related to all of these is the apparent relationship of temperature being 
proportional to pressure, where pressure is in terms of the conserved property per unit 
volume, or messages per node. This was seen in (3.34) 

m 

P(k) = -^(T(k)-b T ) + b P (3.34) 

m T 

This is dimensionally correct as we saw from the partition function nonnalizing 
condition. It also turns out that learning has the same dimensions. The time to perform a 
task is related to cumulative messages perfonned per node per time step. 

The most significant contribution is the relationship of the dynamical systems 
model to the bakers transformation. 
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This is of the form of the bakers’ transfonnation. In this more general case, 
where p s is a rational number 
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Here p s 


1 + 




We saw the relationship to reversible and irreversible 

v ^ y 

entropy components, portation and production, mutual information and Baysian 
conditional probability, and chaos and order. To the authors knowledge, the relationship 
of p s in the coefficients in the bakers transformation to temperature had never been shown 
before for technology transfer, or software evolution. 


The social structure, as defined by Rogers, is not directly addressed in the model, 
but rather would be addressed by a social network analysis method such as Burt’s 
structural holes. Another approach is to look at the money distribution and exchange 
between research organizations. Their revenue income, money, would be exchanged 
with the environment. We might say, making a simplifying assumption that the only 


-208 - 



major stimuli is funding, that the funding distribution by perfoliner bands per capita 
might give insight into a stimuli aspect (heat). This follows from studies of the economy 
using statistical mechanics (Dragulescu 2000). 

The model we have described here is analogous to those used for working with 
mass flows, entropy, pressures and temperatures. There is no discussion of the strength 
of the materials (e.g. social structure 25 ) or the details of the implementation of the end 
product - the engine. This is as it should be. 

We have laid out the fundamentals. The size of the nodes, the production (in , 
message flow), and even hidden in here are the elements of pressure (messages per node), 
and temperature, 1/T, the reciprocal of the uncertainty slope dS/dn, the coldness. 
(Fraundorf 2000), (Schroeder 2000). Massieu (1869) provided the start point for the 
generalized ensemble relations with the Massieu-Plank functions for statistical entropy, 
(Munster 1969), (Planes 2002). We see the a set of entropic potential formulations for 
technology transition dynamics are now available to the community. 

We now have available a method to measure a technology’s temperature. 
Temperature represents the propensity for a system to share properties, infonnation in 
this case. We have worked with some basic tools and used the quantitative version of the 
zeroth law. This could apply to many aspects of software systems, even indices in data 
structures when properly constructed. The theory under all of this need not apply only to 
energy, or information. It also applies to unequilibrated systems sharing conserved 
quantities (money for example), if the only prior information we have is how the 
multiplicity of ways that quantity can be distributed depends on the conserved quantity to 
begin with! (Planes 2002) 

When we have other kinds of infonnation, such as knowledge of a systems 
temperature (the slope of the uncertainty) but not its total infonnation, then the broader 
class of maximum entropy strategies in statistical inference (e.g. the canonical and grand 
ensembles) predict the distribution of outcomes we can expect as well. 

25 The veracity (social capital, per Burt) of the publisher (Carnegie Mellon vs Podunk Community 
College) is left to the designer of the engine desired, and the implementers who fabricate the engine. 
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1. Technology Transition Engine 

Now that we have established the basic relationships of the TechTx Entropy 
models, lets put it in the framework of a system. We can put it all together as an 
evolutionary, technology transfer system that has probabilistic effects at the macro level 
and detenninistic, dynamical effects at the microstate level. We have to the tools to 
analyze a program and represent it as an engine. 

2. Control Volume 

It is useful to define a control volume that is typical of the system Figure V-l. In 
a traditional continuous system in a physical world, a control volume identifies 
boundaries of the system. In such a continuous system, say an engine, a mass flows a 
distance and contributes to the work performed. It is not unusual to partition up a 
continuous control volume into stages, e.g. a compressor, a combustor, a diffuser. As the 
mass m flows from stage to stage, we can consider it a state transition of the system and 
locally of the nodes (compressor, combustor, diffuser). There are n masses flowing, each 
one unique, so the system and the nodes take on different states for the complete 
elaboration of the mass-node states combinations. For now, let’s look at all three nodes 
in the message model with the mass replaced by the message moving through the control 
volume. This causes both local and system level state transitions. 
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Control Volume 

Continuous and Discrete Example 



Compressor Combustor Turbine' 



■CompressorCombustor Turbine 


The nodes transition to a different state as the mass m is present. 
This is the analog of a discrete state machine in a continuous system 


Figure V-l Illustration of a Control Volume — a Continuous System or as a 
Discrete State Machine 


Similarly, in this discrete state machine, we have drawn the boundaries around the 
three nodes. Full elaboration of all of the messages (m) states within the control volume 
would represent all the possible states of the bounded system. With this, we can 
represent an individual interaction, an organizational interaction or even a macro 
technology transfer system such as the economy. 


3. State and Cycle Diagrams 

These technology transition dynamics tools pennit us to engineer a solution to get 
maximum efficiency out of our resources. Let’s examine some of the state diagrams 
and system quantities in Figure V-2. 
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System Quantities 

Q, H, W 



Figure V-2 Technology Transfer State Diagram, System Quantities 


This section will develop the relationships of a temperature entropy (T-S) diagram 
familiar to mechanical engineers when performing engine thermodynamic cycle analysis. 
We suggest that what are the conditions for moving up from one “pressure - temperature 
- entropy” state (numbered 1 -4) to another. 

Here we have a process depicted in macro state space that originates at point 1 
with Tio.Pio.Sio which are the ambient temperature and pressure of the surroundings, a 
reservoir. In a sense, we see this as work, energy or heat. In technology transfer 
dynamics, we can think of this as effort, which is added to the system, yielding 
“energetic” messages. We see an isentropic, (constant entropy) compression as the 
system moves along the path 12 to T?, P/ H ,Si 0 . This says the temperature is increasing 
because some effort is being done to reduce the volume in which the interaction between 
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entities occurs. More occurrences of existing terms consistently show up in messages. 
Terms are combined to get to concepts that are more powerful. While there may be less 
volume, fewer nodes, the message tenn content has higher density. In the model 
proposed, there would be fewer nodes, but doing very intense research, i.e. producing 
much high quality messages. They closely interact and publish messages generally 
within the confines of the system. 

During the progression form state point 2 to 3, energy in the form of effort is 
added at a constant pressure. Entropy, S/ 0 increases to S^. Think of this as a 
demonstration. No new basic research is being performed, the science is being scaled up 
and loaded with a lot of energy that will make it attractive to consumers. This occurs 
when the technology is diffused from state 3 to 4. A high pressure, concentrated set of 
messages escapes into a In order for this to happen the message entities must some how 
move to a bigger volume, must some how escape. This is where work is taken out, as 
products are delivered to a market (ambient). This is shown as a constant entropy line, 
which a rapid drop from T hb P hh S hi , at state point 3, to state point 4, T 4 ,Pi 0 ,Shi. Work, in 
thennodynamic tenns, is represented by extensive property rate changes. For example, 

W = nC p (T 3 -T 4 ) (5.7) 

Where W is work (product) yield, h is message flow, C p is the specific heat at 
constant pressure, and T is the temperature. While the technology transfer dynamics 
doesn’t have foot-pounds per se, it does have a state change per time step, and terms and 
sets of sets of tenns are the extensive property. The sets can even have “weights” based 
on the primitive terms in the set, or the q-level. 

Figure V-3 illustrates the T-S cycle diagram for Ada. We mist recognize that this 
was not an “engineered” system, yet we can still see the faint trace of the cycle of Figure 
V-2. Note the super imposed cycle diagram. We recognize that we can not achieve 
constant entropy, so the first state transition move up and to the right. This is when the 
early researchers, innovators and early adopters, are at work. The next state change is the 
chord which moves off to the right, might suggest that early adopters and early majority 
are adopting and performing experiments, withstanding some pressure above ambient, 
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and demonstrating internally and externally. In fact it looks like the is a steady increase 
in pressure, until the maximum when the system starts to diffuse, the state transition that 
drops off and toward increasing entropy, but lower temperature. Lower temperature 
implies lower pressure. The rectangle represents the ideal cycle, the Carnot cycle. The 
maximum efficiency is limited by the ration of Ti 0 /Tj u . 

Temperature - Entropy (T-S) 


Entropy 2 Interacting Systems T_S (A) 
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Figure V-3 Temperature Entropy Diagram - Ada 

The research tied together fundamental elements underlying technology 
transition. Currently, systematic techniques for assessing macro mechanisms for 
transferring software-engineering technologies has been thoroughly reviewed and 
systematized. This dissertation developed the fundamental elements of an industrial 
model of a software technology transition engine. The mechanisms developed utilizing 
information theory, communication theory, chaos control theory, and learning curve 
principles. The combination of those scientifically sound mechanisms provides a basis 
for assessing, and / or prescribing a portfolio of technologies and the implementing macro 
infrastructure. Linkages to lower level models and implementation methods are 
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provided.. This research provides the engineering framework for a practical method for 
a program manager to establish a high capacity transition channel, which accelerates 
technology maturation and insertion. Data samples assess the following technologies: 
software technology transfer, Ada, Java, abstract data types, rate monotonic analysis, cost 
models, software standards, software work breakdown structures. Also included is an 
extensive annotated bibliography on software technology transfer and related references, 
and a bibliography including related material from philosophy, psychology, math, 
physics, thennodynamics, management, economics, game theory, technology transfer, 
software engineering, and systems engineering. 

The application of foundational relationships pennits a development of a software 
technology transition engine. 

Finally, it is left to the community to detennine whether this is satisfactory to 
support the following logic: 

• since we should be able to accept that a process is just a program 
(Osterweil 1987) and 

• software can represent the program, and 

• the engine is the representation of a process that was based on axiomatic 
and logical transfers from established science and engineering (physics 
and thermodynamics) 

• The basic elements of the physics of software have been developed 

A broad area of future research is outlined in the next section. 
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VI. IMPLICATIONS FOR FUTURE RESEARCH. 


The research explored the use of entropy in infonnation theory. Great effort was 
put into ensuring that units (as in dimensional units) are consistent across the various 
analysis techniques using measures. The unit analysis drives toward statistical inference 
techniques. For example, the common unit for length is measured in informational units 
and related to various distributions. This section suggests areas of future research in the 
areas of: 

• Development of “engine” design and analysis, applicable to technology 
transition, evaluation and risk and general enough to be applied to the 
evolutionary software development process, and software itself. 

• Application of the entropy metric to the evolutionary software 
development process. 

• Linkage of messages in the software development process to the software 
application. 

• Analysis and linkage of software to the infonnation theoretic, and 
dynamical systems, the dynamical system linkage is only now available as 
the result of this research. 

• Development of a complexity metric for software, which computes the 
temperature from both the structure (infonnation theoretic) and the flow 
(dynamical micro model) 

• Learning curve relationship of performance and entropy 

• Exploration of the use in molecular and biologically inspired computing, so 
that we no longer “program” software rather we grow it. 

• Developing the relationship to quantum mechanics and exploring the 
possibility of an “infonnation field theory”. This would explore maximum 
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entropy as the underpinning construct that governs physical gravity, or the 
tendency for bodies to attract i.e. desire mutual information through the need 
for correlation of various properties. 

• Finally, explore the implications of relationships of software physics to a 
quantum theology, and the true mysteries of the universe. 
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A. TECHNOLOGY TRANSITION ENGINE 


The work drives toward an "engine" that has a simple control mechanism, just as 
one might imagine, — a gas pedal or throttle. This means all of the various components 
are in balance (there is a predicate relationship at the boundaries that must be satisfied) 
and represent a dynamic system. The engine also is affected by the economy (the 
environment) at the control volume boundary of the system. Let's suggest a metaphor for 
additional research. Assume that the technology transfer engine is like a jet engine, the 
amount of thrust it can produce from the ejected (and conserved) quantities is very much 
a function of the thermodynamic design of the engine. This is the bulk of the effort in the 
model; however, if the jet's diffuser ejects at a speed relative to the engine's forward 
motion and high altitude jet stream, the total speed is some aggregation of all of these 
effects. Since we wish to predict, with some confidence, whether a technology will 
arrive at a given time, these "macro economic - environmental" factors must be 
represented in the model. 

There is a juicy direction for further research further developing that metaphor of 
thennodynamics and information theory. From that point of reference, one can envision 
a second law analysis, i.e. focusing on the inefficiencies. Those inefficiencies establish 
the requirements to the technology base in a "problem oriented", "requirements pull" 
approach. Viewing this in the thermodynamic cycle metaphor, imagine the waste heat 
going out the exhaust (i.e. scrap and rework in the software development process) being 
redirected to preheat or regenerate the input into the cycle (i.e. guide the research agenda 
and focus on the heavy payoff opportunities). 
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Temperature Entropy Diagram 



Figure VI-1.Technology Transition Engine Temperature Entropy Diagram. 

Future research should experiment and calibrate the specific heats of various 
technologies and software. Ambient temperature should be calibrated for general regions 
of research in technology domains. 

A software technology transition (Tech Tx) engine could be analyzed with the 
tools (Temperature, Pressure, entropy, messages, and specific heat) developed. 

It should be argued that such an engine, which pumps technologies to the user 
community, should have certain properties. The object would be to design an efficient, 
i.e. the maximum amount of work product should get to the goal of insertion with the 
minimum amount of resources consumed and wasted. It is suggested that the use of a 
cycle diagram, familiar to physicists, mechanical engineers and thermodynamicists, could 
be used to evaluate the efficiency of the technology transfer engine. This approach is 
similar to a Camot cycle analysis using state points of entropy, temperature, and pressure. 
Related to analysis of the engine suggests areas for additional work: the notion of 
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“squaring the Carnot cycle”; the Second Law Analysis, a description of the TechTx 
engine in tenns of evolutionary software development process; and identification of 
software development entropy metric. Further, since this research has based its 
foundation on physics and thermodynamics, we now have the full richness of those 
disciplines potentially available. This will permit building on existing theory in these 
areas with the language familiar to the scientist and engineer. 

With such tools, a decision-maker would be able to detennine the confidence that 
a technology or group of technologies will arrive on at a given time frame within a 
certain confidence limit. For example, a program might expect a portfolio of 
technologies to arrive by year 06 with an 80% certainty, but the model might show that in 
06, there is only 60% certainty of being available using the current trends. (See Figure 
VI-2). The desired 80% certainty would not be available until 08. If the technology is 
not predicted to arrive as required, the model will point to the areas for remedy with a 
prescriptive solution as to how to organize, train and equip in order to change the 
confidence of arrival. 
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Program Office Use for Risk 
Assessment and Rx 



Example: 

Program Office Wants 
by 06 with 80% certainty 

Analysis indicates 08 

What nodes / programmatics 
need to be put into place to 
shift curve to left? 

From desired system curve 


OS Algebraically solve for node response curves(s) 


Determine how many and parallel / serial 


Figure VI-2. Model Usage in Program Office Technology Risk Assessment. 


While this research developed the general relationships of properties, the 
application code to do the analysis was limited to the needs of generation data to validate 
the relations. The application macros were written to easily be incorporated in to 
Microsoft Office applications. A user interface that permits a program manager, or 
technology policy maker to perfonn “what if scenarios” would be most useful. 

The concept of entropy for a software technology transfer process is defined. 
This entropy concept is also adapted to meet the character of an evolutionary software 
development process. From this pivot point with the intensive properties such as 
temperature and heat capacity — now expressible in information units, a model can be 
developed for the software technology transition engine. The model developed, herein, 
has the features of a communication and control system theory. It accommodates mixing 
effects, chance, and the maturity of the individual organizational units to reflect a 
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learning organization unit consisting of people and machines. This was done with the 
separation of microscopic issues from the macroscopic using the analysis of stable 
dynamical systems, and relating the properties of these system properties to the dynamics 
of the system nodes. 
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B. SOFTWARE COMPLEXITY METRIC BASE ON 0 DEGREES SABOE 

As we saw in Chapter III, we have a relationship to the Weibull probability 
distribution function. 

f 

—I SiAL\ 

P{q i ) = e p ’ a Weibull Distribution (6.1) 

The Weibull is used in other research (Nogeria 2000) to address a number factors 
that effect an evolutionary process. In Nogeria’s case, it was used to model the 
requirements volatility, efficiency of the performers in the process and the size of a 
software artifact indicated by a complexity. It should also be noted that in that study, the 
independent variable was time. In this case, we are addressing messages, q t , or the 
structure of the artifact to determine a measure of complexity (temperature). The number 
of messages processed in a time step can be converted to time as an independent variable 
with some mathematical manipulations. This can be related to the learning curves. 

There was some difficulty in addressing complexity, in that research. The use of 
microstates of an alphabet, and temperature may contribute to advancing related research 
efforts. In this case, we might let the x-axis shift of the Weibull, y^O, and see that a ~ 1. 

There is a close connection to Halstead metrics as stated earlier. Halstead metrics 
can be easily connected to the temperature. He detennined the alphabet of operators and 
operands. Looking carefully at his equations he is very close to using entropy as a 
metric, but just misses the connect by a simple division. 

He defined the program volume Vby 

V = Alog -,77 (Halstead 1977, pl9, eqn 3.1) 

Where N is related to total usage of operators and operands. Defining each 
operator and operand as a tenn, these are the instances counts (n) in this dissertation. The 
number of distinct operators and operands (terms in our tenninology) is his 77 . Had he 
not used the actual numbers, but rather summed the probabilities of occurrence and log of 
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the probabilities he would have had Shannon’s equation. S = p(x)\og 2 p(x) , and he 

xeZ 

would have had the (Saboe) entropy metric for the software. 


He related input data streams and program levels. Linking temperature, and 
entropy as defined in this dissertation to software volume, and length metrics of Halstead 
will bring the ability to quantify abstraction using the entropy contribution approach. 

Where each meta level, partition, band or module i, provides a contribution, Q to 
the total population entropy. The local entropy S H , can be scaled based on the 

multiplicity Q, of tenns in the band to the multiplicity Q of terms in the population. 
Similar to the equations we introduced earlier. With the total population’s entropy is the 
sum of the contributions. 


n_bands 

W, = 1 0 (6.2) 

1=1 

IQ.I IQ.I IQI 

where C =— —S H +-——log- (6.3) 

| Q | ' | Q | & | Q, | 


Halstead was instead limited to programming language view. Her we can start to 
deal with abstraction and complexity, a subject the is careful to say is not addressed. 

Halstead did not use the notion of q-levels. This can make a great difference in 
the power of his metrics and provides one of the missing ingredients, temperature. 

The linkage of the dynamical equations can be shown through McCabe’s metric, 
cyclometric complexity. 

Going through Halstead metrics, we can get to the Stroud number. This is related 
to tasks (decisions) per time step. That linkage will be suggested as an area of future 
research in the learning curve section. 
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C. TECHTX ENTROPY LEARNING CURVE MODEL, MICRO LEVEL DATA 

ANALYSIS 

1. Nodal Performance Data 

As we saw in Chapter III, in order to get to the right level of granularity, the 
performing organization nodes criteria and bands are assessed and presented. The 
distribution of the performance index for the complete data set is shown in four bands. 
The capacity perfonnance index over time is shown for each of the bands. This 
represents the best that the band can do (on average) at the time of perfonnance. The 
entropy is allocated to the perfonning nodes (affiliated organizations) using a per capita 
rate in a band. 

The output entropy is allocated from the message to individual performers from 
the empirical data. This micro level is then summed up and allocated to the to the 
affiliated organizational level. The organizations are banded based on a distribution of 
the cumulative number of published messages. This accumulation of experience is from 
the beginning of the data set to the time step at the performance time step. In this case, 
that is 22 years. An example of the distribution is shown in Figure VI-3. We see the 
standard cast of high perfonning nodes. These are world class research organizations 
(with the Naval Postgraduate School in the top 15 of over 1500 organizations). 
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Productivity Distribution (Ada) 



Figure VI-3 Performing Organization Distribution Bands at End of Data Set 


D. MOLECULAR AND BIOLOGICALLY INSPIRED COMPUTING 

Molecular and biologically inspired computing could possibly build from the 
relationships developed in this research. In the future, it is possible that we will be unable 
to “program” molecular computers as we do today. We will want to grow software. The 
software will likely compute similar to biological systems that evolve. They likely will 
use patterns and associations, and move in the direction of least resistance, and maximum 
potential. The model development in Chapter III addressed relationships, changes, and 
lock in effects for the technology in question. It may be able to be adopted for the more 
general class of evolutionary system. 

One could synthesize attractors and repellors (sources and sinks) to guide the 
process. This is similar to the strategy, which might develop macro economic - 
environmental effects that drive technologies from an evolutionary growth aspect. 
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The research proposed a software technology transition cycle analysis approach. 
This permits analysis of various approaches for policy and investment trades. Tools that 
build on this analysis approach can help identify leverage points and opportunities to 
accelerate progress with a repeatable and rigorous approach. 

In this type of environment, we can make the relationship of the 
macro/microscopic connection explicit. The work should provide an axiomatic 
development for a second law analysis — think of this as analyzing the inefficiencies, 
which in turn provides a mechanism for feedback. 

Future work addresses implementing the model in an organization, writing policy 
to enable the realization of the model and experimentation to validate the theory. 
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APPENDIX A INFORMATION, CONTROL THEORY AND 
EVOLUTIONARY DYNAMICAL SYSTEMS BASICS 


This appendix reviews some basics of information theory, beyond what is given 
in Chapter 3. A discussion of the meaning of an operator, the significance of the 
eigenvalue, and the basics of Markov chains is provided. The content can be found in 
several common graduate texts, but the references here are readily related to the usage in 
terms of symbolic dynamics and infonnation used in the technology transfer models 
developed in this dissertation. There is no attempt made to clarify the development of 
these topics beyond the very basics to give the reader a quick primer on the subject. All 
theorems come directly from the reference documents. In those references, there are 
examples and narrative that can provide a deeper understanding to advance past this 
appendix primer. The references also provide explicit details on properties and 
conditions that must be observed for the theorems to hold. 

After the presentation of these topics, the reader is provided with a very brief 
discussion on the relationship of randomness and complexity. Further research can move 
forward minimizing the burden of taming the mathematical notions using these concepts. 
This appendix provides some of the deeper mathematical and physics advances reflected 
in the technology transition models, what is speculated as relevant to evolutionary 
software development, and software itself. Further discussion on the math, and physics 
utilized can be found in Prigogine (Prigogine 1983, 1987, 1997), Shannon (Shannon 
1948), Jaynes (Jaynes 1957a,b), Kolmogorov (Kolmogorov 1965), Fanner (Farmer 
1983), Baker (Baker 1990), and Brown (Brown 1992, 1992a, 1993, 1993a,b,c,d, 1995, 
1996, 1996a, 1997, 1998, 1999, 1999a). Of these, the references from Shannon, and 
Prigogine are the best place to start. Reasonably readable graduate textbooks on 
information theory and Kolmogorov’s complexity are (Cover 1991, or Li 1993) 
respectively. Baker’s text on non linear systems and dynamical fundamentals (Baker 
1990) is an easy place to start. 


-273 - 



A. 


INFORMATION THEORY 


What follows is a basic review of entropy in infonnation theory after Shannon, 
Jaynes, Kolmogorov, Uspensky, and others as found in Li, (Li 1993) and Cover (Cover 
1991). This review section is drawn from Cover (Cover 1991 pi3). 

Let X be a discrete random variable with alphabet E and a probability mass 
function p(x)=Vr{X=x}, xeE. p(x) and p(y) refer to two different random variables and 
are in fact two different probability mass functions p x (x) and p v (y). For the alphabet, with 
the given probability mass function, the definition of infonnation entropy is: 

S H (X) = -^/?(x)log 2 p(x) (A.l) 

xeE 

Sh is the entropy measured in bits, and the log is base 2. Logo will be assumed 
throughout unless otherwise noted. For example, the entropy of a fair coin toss is 1 bit. 
The convention of 0 log 0 —>0 is used, which comes from continuity since x log x —>0 + , 

as x —>0 + . Using L’Hopital’s rule ^ m , x = 0 and lim x = we can convert to the 

formoo/oo. (Kreyszig 1993 p500) 

The base of the log is two for the natural units of information entropy as 
developed by Shannon (Shannon 1948). The entropy is a function of the distribution of 
X. It does not depend on the actual values taken by the random variable X, but only on 
the probabilities. 

If X~p(x) which means that the probability of use the random variable is 
representative of the element’s usage over the alphabet, then the expected value £ of a 
random variable g(X) is denoted 

E pW g(X) = Y,g^)p(x) (A.2) 

xeE 

The entropy of a plain random variable X can be interpreted as the expected value 

of log —-—, where X is drawn according to the probability mass function p(x). Thus 
P(X) 
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E P(x) lo § = X lo § “T7 />(*) = - X lo § p( x )p( x ) = S H (A.3) 

/W />(*) *=S 


1. Maximum Entropy - Equal Probabilities 


Here is an example. Let have a system where there are only two choices. 

[ 1 with probability p 

X = \ (A.4) 

[0 with probability 1 - p 

then 

S H (X) = -plogp - (1 - p) log(l -p) = S H (p) (A.5) 

We see that Sh = 1 bit when p=l/2. FigureA-1 shows the basic properties of 
entropy. It is a concave function of the distribution and equals 0 when p=0 or 1. This 
makes sense because when p=0 or 1, the variable is not random and there is no 
uncertainty. The entropy is maximum when p=.5, which corresponds to the maximum 
value of the entropy. 
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Entropy vs Probability 


E^liopi riOjJli Ik 



Entropy S H Expected value 

S H = -Z p(x) log 2 p(x) E p(xf( X ) = 9( X )P( X ) 

S H = -(p) log p-(i-p) log (i-p) 1 

E P <x) l09 m =SH 

FigureA-1 Entropy vs. Probability 

Consider a system where input signals XeT. Specifically, where X is a set 

of terms, 


I T = [term] 
[2 T ={msg } 


(A. 6) 


Where 2 T is a set of all the subsets, often called the power set. Here is an 
example. 


T={A, B, C, D} 


2 = < 


{A,B},{A,C},... 

{A,B,C},... 

...{A,B,C,D} 


(A.7) 
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Now when the number of elements in |T| =4, we get 2^ =2^ =16. The 
maximum entropy occurs when we have an equal distribution of terms. So for a message 
set where each subset of terms appears only once we define S H as 

^max = “X 2^T 2^” ^ 


2 4 


Z 1 1 

In this example, the maximum entropy S H — — ^ — log 2 — = 4. It i 


xeS 16 


is easy 


to see that the maximum entropy will always be |T|, for the condition that all of the sets of 
terms in the set are evenly distributed. 


VI«l)=Zn lo s=n 

/=1 \n\ \n\ 


(A.9) 


^2’ •••’ } 

1 1 1 

? i • •• i 

n n n 


S H (P) = -Y J P l °gP = — = 7^r7 for (A. 10) 

,=i P \T\ 

The entropy maximum is at l/p(X) or |T|, or the number of sets of tenns in the 
alphabet T. In Figure A-2 we see the effect of sets of terms that are evenly distributed. 
In our model, we would not expect to see ,5< p(X) <1 as the result of integer number of 
sets of terms. This is because we make decisions between two choices one set of tenns 
and another set of tenns, that yields a probability of .5. If we have one choice, one set of 
tenns, we are certain of the answer, and the probability is 1/1 or by definition Sn= 0. 
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Maximum Entropy 


Entropy vs 1/ |T| i.e.or p(X) 


The entropy maximum is 1 lp(X), or 
the number of terms in the power set. 



Probability of occurrence = p(X) 


Figure A-2 Even distribution of terms, yields maximum entropy 


Here is another example. Let 

a with probability 1/2, 
b with probability 1/4, 
c with probability 1/8, 
d with probability 1/8. 

The entropy of X is 


(A. 11) 


1 , 1 1 , 1 1 , 1 1 , 1 7 . . A 

—log-log-log-log—= — bits (A.12) 

2 2 4 4 8 8 8 8 4 
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Suppose we wish to determine the value of X with the minimum number of binary 
questions. An efficient first question is “Is X=aT ’ This splits the probability in half. If 
the answer to the question is no, the second question can be, “Is X=bl” The third 
question is “Is X=c?” The resulting expected number of binary questions is 1.75. This 
turns out to be the expected number of binary questions required to determine the value 
of X. It can be shown that the minimum number of binary questions required to 
detennine X lies between S H (X) and Sh(X+ 1 ). 

Let’s now introduce the definitions for joint and conditional entropy and mutual 
information.. These are key facets of the technology transfer models proposed. 

2. Joint Entropy 

Joint entropy S(X,Y) of a pair of discrete random variables (X,Y) with a joint 
distribution (X, Y) can be considered to be a single vector-valued random variable. The 
joint probability p(X,Y) be defined as p(x,y) is the probability of a joint occurrence of 
event X=x and event Y=y. This leads to 


Y) = Yj P( x X) log (p(x,y) (A. 13) 

xeE je'P 


which can also be expressed as 


S H (X,Y) = -E\ogp(X,Y) 


(A-14) 


3. Conditional Entropy 

The conditional entropy of a random variable given another is defined as the 
expected value of the entropies of the conditional distributions, averaged over the 
conditioning random variable. If (X, Y)~p(x,y), the conditional probability is p(X\ Y) of 
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outcome X=x given outcome Y=y for random variables (not necessarily independent). 
The conditional entropy S(Y\X) is 

S h {Y\X) = '£p(x)S H (Y\X = x) (A. 15) 

xeE 

= -£/K*)£Ky|*)logXy|*) (A. 16) 

xeE ye^ 

= -XX^Cv,^)log^(y|x) (A. 17) 

xeE t’g'F 

= -E p(xy) \ogp{Y\X) (A. 18) 


This is shown in the Venn diagram in Figure A-3. The mutual information is 
given as I(X;Y). 


Mutual Information and Entropy 

(Conditional) 

l(X;Y) = S H (X)- S h (XIY)^- -(1) 
l(Y;X) = SJY)- S h (Y/X) (2) 



Figure A-3 Mutual Information, Joint and Conditional Entropy 
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Referring to Figure A-3for the models proposed, the entropy of the vocabulary of 
tenns at time step k is the input entropy S H (X). The joint entropy S H (X,Y) is the 
cumulative entropy at time step k+1. The Sh(Y) is the incremental contribution of the 
time step k+1. The mutual information, I(X;Y), can be calculated from equation (3) in 
Figure A-3, given the data for the input entropy, the incremental contribution, and the 
joint entropy. Using Figure A-3, equations (2) and (3), the conditional entropy is readily 
computed. 

Let’s look at an example with a vocabulary of 4 terms {A,B,C,G} in a di- gram. 
We begin by building a matrix with the headings on the rows and columns being 
elements of the vocabulary. The frequency of the tenns occurring together is given in the 
cell. When we have the tenn the AB, with the A in the row (this is the input X) then we 
have A appearing in the same message as B given in the column heading. In the models 
proposed, we are not concerned about the order of the terms, i.e. which precedes which, 
we are satisfied to know that a tenn appears with another. This is because we are using a 
message as represented by the records index tenns. In free text implementations, without 
controlled indexing such as using the Internet, or data mining the case where A is the 
column, and B is the row heading, B precedes A. In our models we actually count the 
pairs, triples, etc and build the vocabulary since the languages of the technologies are 
generally small. Typically we see about 2000-3000 single terms. 

Actually, there are sets of subsets {}, {A}, {B}, {C}, {G}, {AB}, {AC}, ... 
{ABC}, {ACG}, ... {ABCG} as possible “tenns”. To get the count of all of the 
pennutations for triples, and quadruples, etc, the process can be repeated with the row 
headings, including pairs, and the columns singles. Similarly for quadruples, once the 
triples are computed, we can use the triples as the row headings and singles again as 
column headings. For our purposes, we have simplified the matrix for purposes of 
example. A {} preceding the column tenn could be arranged to imply that a new term 
has been added to the vocabulary in this time step. 
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Figure A-4 Example 1, Vocabulary Distribution 

In Figure A-4, the entropy for the example 1 (exl) marginal distribution of X is 
given as S H (X) exl (.25, .25, .25, .25) is 2 bits. The marginal distribution of Y is given as 
S H (Y) ex i (.5, .25, .125, .125) is 1.75 or 7/4 bits. The conditional entropy of Y outcome 
given the X, is given as Sh(X\ Y) ex i and is 1.625 or 13/8 bits. The conditional entropy of X 
outcome given Y given as Su(Y\X) ex i is 1.375 or 11/8 bits. The joint entropy is from the 
probability of a joint occurrence. It is given as S H (X,Y) e xi is 3.375 or 27/8 bits. 


S h (X,Y)<S h (X) + S h (Y) (A. 19) 


There is equality only in the case where X and Y are independent. In all of these 
equations, the entropy quantity on the left side increases as we choose probabilities on the 
right hand side more equally (recall Figure A-3). The mutual information I(X;Y) is 
computed 
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I{X-Y) = S h {X)-S h {X\Y) (A.20) 

In this example, the mutual information is 2-1.625 = .375 or bits. 
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B. OPERATORS, EIGENVALUE SIGNIFICANCE, MARKOV CHAINS, 
ERGODIC PROCESSES 

1. Operators and Eigenvalue Significance 

In Chapter 3, we introduced the function X k+] = F(X k ) Let’s put this more 
specifically in the terms of a distribution function p(x) and provide an overview of the 
recurrence relation p k+1 (x) = Up k (x) (A.21) 


The distribution function p k+] (x) after (k +\) maps is obtained by the action of the 


operator U on pfx), which is the distribution function after k maps. Let’s consider what 


we all know from mathematics of periodic functions such as sin 


( 2KxP 

~Y 


. This function 


v * J 

remains invariant when we add to the coordinate x the wavelength A, as 


. 2kx . 2n{x + A) 
sin-= sin - 


A 


A 


Other periodic functions are cos 


2 KX 

~T~ 


or the more complex combination 


/ 2 KX . . 2 KX 

e A — cos- hzsm - 

A A 

With that notion in hand, what follows is a discussion by Prigogine for a quick 
review (Prigogine 1997, p92). 

An operator is a prescription on how to act on a given function; it may 
involve multiplication, division, differentiation, or any other mathematical 
operation. In order to define an operator, we define a function space. 

That is, we specify the domain, the types of functions it acts on, indicate 
whether they are continuous or bounded, and other characteristics as 
required. In general an operator U acting on a function f(x) transforms it 

into a different function. For instance, if U is a derivative operator, —, 

dx 

then Ux 2 =2x. However, there are special functions known as 
eigenfunctions of the operator, which remain invariant when we apply U; 
they are multiplied only by a number, the eigenvalue. In the above 
example e kx is an eigenfunction to which the eigenvalue k corresponds. A 
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fundamental theorem in operator analysis is states that we can express an 
operator in terms of its eigenfunctions and eigenvalues, both of which 
depend on its function space. 

Physicists use Hilbert space in quantum mechanics. Prigogine goes beyond 
Hilbert space for operations in unstable dynamical systems. 

Consider the “equations of motion” 


'■k+l 


x k + —, modulo 1 (i.e. the numbers between 0 and 1) See Figure 


A-5 for this simple periodic map. After two shifts we are back to the 
initial point. 


i.e., Xq 



3 _3 

4 ,Xl ~4 4 _ 4 _ 4 


Instead of using individual points located by trajectories, we are using 
ensembles represented by the probability distribution p(x). A trajectory 
corresponds to a set of ensembles where the coordinate x takes on a well- 
defined value for x*, and the distribution function p is reduced to a single 
point. This can be written as 


p k (x) = 8(x-x k ) 


Here delta, 8, is a symbol for a function 1 that vanishes for all values of x 
except x=Xfc. By using the distribution function p, the mapping can be 
expressed as a relation between p k+1 (x) and p k (x ). We can then write 

p k+l (x) = Up k (x). Formally, p k+] (x) is known as the Perron-Frobenius 
operator acting on p k (x). The ensemble description must allow the 
trajectory description as a special case. So, we therefore have 
S(x — x k+[ ) = US(x-x k ). This is just rewriting the equation of motion, as 
Xk becomes Xk+i after one shift. 


1 This is called the Dirac delta function. (Prigogine 1997 p33) 
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Simple geometrical 
construction that moves from 
initial point P 0 to the next point 
P 1 according to the map 

x k +1 ~> x k + 1/2 

We go from P 0 to P\ then to 

P”on the bisector, and from 
there to P,. If we start with P, 
we come back to P 0 

~* k 

Figure A-5 Periodic Map (Source: Prigogine 1997, p82) 

The simplest example of deterministic chaos is a Bernoulli map. In a 
Bernoulli map the value of a number doubles every time step, with the 
value of the number between 0 and 1. Consider the equation 

x k+1 = 2x k , modulo 1 (i.e. dealing with numbers between 0 and 1) . 

The equation of motion is again detenninistic, since once we know x k , the 
number Xk+jis determined. As the coordinate x is multiplied by two each 
time step, the distance between the tow trajectories, will be 

(2 k ) = e klog2 modulo 1. 

In tenns of continuous time, t, this can be written as 
e ,x with A = log 2 

where A is called the Lyapunov exponent. This shows the trajectories 
diverge exponentially, and is the signature of detenninistic chaos. This is 
a dynamical process leading to randomness. What Prigogine does which 
is new, and we exploit here is the statistical formulation of the Bernoulli 
map, which links randomness to operator theory. 

The explicit form of the evolution operator U obtaining 

P M (.*) = Up k (*) = \ (A.22) 

\ 1 J. 
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This equation means that after (k+ 1) iterations, the probability of Pk(x) is 

X 1 H - X 

determined by the values at points — and ^ . As a consequence of the 


form of U, if Pk is a constant equal to a ; Pk+i is equal to a, since Ua=a 
The uniform distribution p=a, which corresponds to equilibrium, is the 
distribution function reached through the iteration of the shift, for 


On the other hand, we have the case when p k (x) = x, here we have 


1 


1 


p k+ 1 (x) = —+ — . In other words, Ux = —I- —, where the U operator 


4 2 


4 2 


1 X 

transforms the function x into a different function, —+ — . We can find the 

4 2 

eigenfunctions as defined above in which the operator reproduces the 
same function multiplied by a constant. In the example 


U 



(A.23) 


the eigenfunction is therefore x - — and the eigenvalue is — . If we repeat 
the Bernoulli map k times, we obtain 


U k 


f n 


rn 

k 

( 0 

x — 


— 


x — 

V 2 J 


12 ) 


v 2 y 


(A.24) 


which moves toward 0 as k—>°°. The contribution 


1 

x — 
2 


to p(x) is 


therefore rapidly damped at a rate related to the Lyapunov exponent. This 
turns out to be a class of polynomials called Bernoulli polynomials. 

( i Y 

Denoted as which are eigenfunctions of U with eigenvalues of — , 

where k is the degree of the polynomial. 


Prigogine is careful to emphasize the distinction between “nice” functions, 
and “singular” functions. These are also called generalized functions or 
distributions, which are not to be confused with probability distributions. 
The simplest singular function is the delta function S(x). S(x—x 0 ) is zero 

for all values where x ± x 0 , and infinite where x = x 0 . Singular functions 
have to be used with nice functions. For example, if f(x) is a continuous 
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function J dxf (x)S(x -x 0 ) = f(x 0 ) has a well defined meaning. In contrast 
the integral containing the product of singular functions, such as 
^dxS(x-x 0 )S(x-x 0 ) = S( 0) = °°, diverges and is meaningless. 


Defining the operator U in terms of its eigenfunctions and eigenvalues is 
called the spectral representation of the operator U. There are the set of 
functions Bp(x), the Bernoulli polynomials which are nice functions, but 
there is a second set, B k (x ), which are formed by singular functions 

related to the derivatives of the & function. To obtain the spectral function 
of U and Up, we use both sets of eigenfunctions. 

As a result, the statistical fonnulation of the Bernoulli map is applicable 
only to nice probability functions p and not to single trajectories that 
correspond to singular distribution functions represented by ^-functions. 
So the equivalence between the individual descriptions in terms of 
trajectories represented by ^-functions is broken. For the continuous 
distribution p, Prigogine obtained results that go beyond trajectory theory. 

We can calculate the rate of approach to equilibrium and therefore to an 
explicit dynamical formulation of irreversible processes that take place in 
a Bernoulli map. Probability distribution takes into account the complex 
microstructure of the phase space. 

When using both the B k (x), which are nice functions, and the second set, 
B k (x) which are singular functions, Prigogine moves from simple Hilbert 

space to a rigged Hilbert space, or Gelfand space. He obtains an 
irreversible spectral representation of the Perron-Frobenius operator as it 
applies exclusively to nice probability distributions, and not individual 
trajectories. 


2. Bakers Transformation 

The attractor for the Bernoulli shift with an irrational initial condition xo is the 
unit interval, with fractal dimension 1 (see Farmer 1983, Baker 1990 for a discussion of 
fractal dimensions). The attractor for the bakers’ transformation is the unit square, with 
fractal dimension 2. The dissipative bakers transformation is given (with a>0) by 
combining the Bernoulli shift 

x k+1 =2x k , modulo 1 (A.25) 
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with the mapping 


<*y k 


y k+ 1 = 


“ y ‘ + 2 


o < x k < \ 
2 

^ < x k < 1 
2 


(A.26) 


The transfonnation is dissipative for a<l/2, because 


j_ d(x k+i ,y k+l ) _ 

d( x k ,y k ) 


2 0 
0 a 


■ 2 a 


(A.27) 


See McCauley (McCauley 1993, pi32) for further discussion. 

The Bernoulli map is not an invertible system. Because the arrow of time exists, 
we have to describe the emergence of irreversibility in invertible dynamical systems. The 
bakers map or bakers transformation is a generalization of the Bernoulli map (Prigogine 
1997), (Tabor 1989), (Baker 1990), (Fanner 1983). Take a square that has sides of length 
1. First flatten the square into a rectangle whose length is 2; then cut it in half and build a 
new square. This is illustrated in Figure A-6 shows an area preserving transfonnation 
similar to a baker rolling out dough. Since the distance between two points along the 
horizontal coordinate doubles with each transformation, it will be multiplied by 2 k after k 
transformations. Rewriting 2 k as e klog2 , as the number k of transfonnations of measure 
time, the Lyapunov exponent is exactly as in the Bernoulli map. There is also a second 
Lyapunov exponent with a negative value -log2, which corresponds to the contracting 
direction of v. 

Prigogine and others show when relating the bakers transformation in the 
representation of a Bernoulli shift, the information contained in the initial condition 
contains the entire past history and future. Again from Prigogine, 

The critical point is that for typical, irrational initial coordinates xo, yo 
associated binary representations can yield a doubly infinite sequence (k=- 
°°, and k=+oc ) as random as a fair coin toss. Thus a completely 
detenninistic dynamical system can yield results that appear completely 
random. The bakers transformation also has the property of all dynamical 
systems, recurrence. The bakers transfonnation is invertible, time 
reversible, deterministic, recurrent and chaotic. 
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Bakers Transformation 



X k+\ 

y k+ 1 


2x k 

yj T’ 


0<x 4 


1 

< — 
2 


Repeated doublings in the x direction 
and halving in the y direction leads to 
rapid mixing. 


2x k -\ 

v 2 4 


Mar 2002 


1 • The mapping is completely 

— < x A . < 1 reversible. Run backwards, the 

2 

doubling occurs in the / direction 
and halving occurs in the x direction 

MSaboe 121 

Ph.D. Defense 2002 


Figure A-6 Bakers Transformation 


For the baker map there is a new element compared to the Bernoulli map 
(Prigogine 1997, pi04). Prigogine shows that the Perron-Frobenius 
equation can be applied to both the future and the past. 

A + 1 =Up k 

and 


P k -1 =u~ l p k 


Here U 1 is the inverse of U. For irreducible spectral representations, 
there is an essential difference between past and future. 

Prigogine’s research has also shown that irreversibility is linked only to 
Lyapunov time for general irreversible phenomena such as diffusion and various other 
transport processes. 
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C. MARKOV CHAINS, ERGODIC PROCESSES 

These sections provide a stand alone reference following Bronson (Bronson 

1982). 


1. Markov Process 

A Markov process is a process where the future evolution of a state depends only 
on the present state. A Markov process (Bronson 1982, p224) consists of a set of objects 
and a set of states such that 

• at any given time an object must be in a stare (distinct objects need not 
be in distinct states; 

• the probability that an object moves from one state to another state 
(which may be the same as the first state) in one time period depends 
only on those two states. 

• The integral number of time periods, past the moment when the 
process is started represent the stages of the process, which may be 
finite or infinite. If the number of states is finite or countably infinite, 
the process is called a Markov chain. A finite Markov chain has a 
finite number of states. 

• Pjj denotes the probability of moving from state i to state j in one time 
step. For an N state Markov chain (where A is a fixed positive 
integer), the N x N matrix P= [/;,,] is the stochastic or transition matrix 
associated with the process. The elements in each row of P must sum 
to one (unity). 

Theorem 19.1 (Bronson 1982 p224) states: Every stochastic matrix has 1 
as an eigenvalue (possibly multiple), and none of the eigenvalues exceeds 
1 in absolute value. Because of the way that P is defined, it is convenient 
to indicate A-dimensional vectors as row vectors, with matrices operating 
on them from the right. According to theorem 19.1 above, there exists a 

vector P A 0 such that 
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pv-p 

This left eigenvector is called a fixed point of P. 


Powers of stochastic matrices are denoted by n. The n th power of matrix P 
is indicated by P" = [p\p~\. If P is stochastic,, then p\p represents the 

probability that an object moves from state i to state j in n time steps. It 
follows that P" is also a stochastic matrix. We denote the proportion of 
objects in the state i at the end of the n ,h time step by p\ n) , and designate 


p {) = [p[ n) ,P2 



is the distribution vector for the end of the n th time step. Similarly, 


represents the proportion of the objects in each state at the beginning of 
the process. p n) is related to P W) by the equation 

pn) = yT) p (Bronson 19.1) 

In writing theorem 19.1 the proportion of the objects in state i that make 
the transition to state j are implicitly identified with the probability p,j. 


2. Ergodic Process 

Again following Bronson (Bronson 1982 p 225) we define the properties required 
for an ergodic process in terms of egrodic and regular matrices.. 

A stochastic matrix P is ergodic if limP" exists; that is, if each //" ' has a 

limit as n —>°°. The limit matrix is denoted by necessity by L. The 
components of p (oo) , defined by the equation 


P^ = P W L 


(Bronson 19.2) 
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are the limiting state distributions and represent the approximate 
proportions of objects in the various states of the Markov chain after a 
large number of time steps. 

Theorem 19.2 (after Bronson 1982, p225) states 

A stochastic matrix is ergodic if and only if the eigenvalue A, of 
magnitude 1 is 1 itself and, A=1 has multiplicity k, there exists a k linearly 
independent (left) eigenvectors associated with this eigenvalue. 

Theorem 19.3 (after Bronson 1982, p225) states 

If every eigenvalue of a matrix P yields linearly independent (left) 
eigenvectors in number equal to its multiplicity, then there exists a 
nonsingular matrix M, whose left eigenvectors of P, such that D=MPM _1 
is a diagonal matrix. The diagonal elements of D are the eigenvalues of P, 
repeated according to multiplicity. The convention is adopted of 
positioning the eigenvectors corresponding to A=1 above all other 
eigenvectors in M. Then the diagonalizable, ergodic, NxN matrix P with 
A= I of multiplicity k, the limit matrix L may be calculated as 


1 


1 


L = M' 1 (lim D" )M = M' 1 

n —»°o 


1 


0 


M 


L o 

(Bronson 19.3) 


The diagonal matrix on the right has k l’s and (N-k) 0’s on the main 
diagonal. A stochastic matrix is regular if one of its powers contains only 
positive elements. 


Theorem 19.4 (From Bronson 1983, p225) states 
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If a stochastic matrix is regular, then 1 is an eigenvalue of multiplicity 
one, and all other eigenvalues satisfy \X(< I. 

Theorem 19.5 (From Bronson 19983, p225) states 

A regular matrix is ergodic. 

If P is regular, with limit matrix L, then the rows of L are identical with 
one another, each being the unique left eigenvector of P associated with 
the eigenvalue A=\ and having the sum of its components equal to unity. 
Denote this eigenvector by E|. It follows directly from (Bronson 19.2) 
that P is regular, then regardless of the initial distribution of 


f^= Ei 


(Bronson 19.4) 


Figure A-7 and Figure A-8 provide an example of the state transition rules in a 
communication context after Shannon, and an example of a two state Markov chain. 


Markov Processes 
State Transition Rules 


• Stochastic processes known as Markov process 

• There exists a finite number of possible states in the system SI, S2,, Sn 

• There is a set of transition probabilities; 

- Pi (j) the probability that if the system is in state Si it will go next to state Sj 

• State will correspond to a “residue” of influence from preceding 
messages 

• The processes will be ergodic -- i.e. roughly this means every state 

propertr'- u —- 2-a — n ig 4 g 



-- probabilities in this case -- will equal 1 


Nov 2001 


M Saboe 

Ph.D. Defense 2001 


79 


Figure A-7 Markov Process State Transition Rules 
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Example Two State Markov Chain with 
Probability Transition Matrix 


r 

1-a a 
p i-p 

V J 



• Stationary distribution 

•represented by vector f} 

•components are stationary probabilities of 

• state 1 and state 2 respectively 

• Stationary Probability found by solving by p P = p 

• or balancing the probabilities 

• For Stationary distribution the 

net probability flow across any cut - set 
in the state transition graph is 0 


yo 1 a = p 2 (3 Since p,+p 2 = 1 

P a 

Pl = - o - P 2~ --- 

a + p a + p 


• Entropy at state X n at time n is 

P 


S(XJ- 


a + |3 


a + p 


Figure A-8 Example of Two State Markov Chain 


D. SYMBOLIC DYNAMICS AND INFORMATION 


We consider a system in discrete state space. The development, which follows, is 
structured closely to the exposition by Prigogine (Prigogine 1987, pi83) on symbolic 
dynamics and infonnation. 

Establish a probability distribution underlying a process. Set up balance 
equations that counts the processes leading the system to state Q and the processes 
removing it from the state (Prigogine 1987 pi53). 

We get 
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d prob(Q, t ) 
dt 


= (contribution of transitions to state Q per unit time) 


-(contribution of transitions from state Q per unit time) 


= R + (Q)-R_(Q) 


(Prigogine 1987, eqn4.9) 

which becomes a problem of determining the transition rates R+ and R.. 
The system must satisfy conditions of a detailed balance following the 
constraint conditions, similar to that of thennodynamics, or similarly 
Markov processes above. So if we decompose R+ and R _ into the 
elementary processes taking place in the system, 

R ± = Tj r k,± 

k 

The following local condition must be satisfied. 

(p, + )cquii = K-)equii (Prigogine 1987, eq 4.10) 

These relations must in turn be compatible with the fonn of the probability 
distribution in the state of equilibrium, which is known from statistical 
mechanics. The limiting case of such a distribution is a Poisson 
distribution. Einstein showed at equilibrium, that the probability of 
fluctuations is entirely determined by thermodynamic quantities. In an 
isolated system (i.e. in a control volume, the inversion of Boltzmann’s 
formula yields 


S = k h ln( number of molecular arrangements compatible with a given energy value) 


Which we know from Shannon’s theorem 2, that Boltzmann’s constant kb, and the 
natural log can be eliminated and converted to log 2 respectively (Shannon 1948, pi 1, pi) 
for the measure of entropy in infonnation units. So since Jaynes (Jaynes 1956, 1956a) 
developed the relationship of information and communication theory to statistical 
mechanics, we can invert the equation and write 

Pequii ~ (After Prigogine 1987 eq 4.11) 
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where AS is the change in entropy due to fluctuation, AS = S(Q) — S(Q a ). 
Prigogine also requires that the (Prigogine 1987, eq4.9), 


...in a limiting sense, must reduce to evolution dealt with in the 
detenninistic description. We expect the macroscopic observations will 
yield values representative of the most probable state in a physical system. 
Looking at this mathematically, we would expect the same for a 
communication channel, i.e. that the peaks of P(Q,t) be solutions to 
detenninistic equations. 

If the system is uni-modal Figure A-9, which is our case in an evolutionary 
system, this implies that the equation for the mean value is close to the deterministic 
equation, the correction is essentially proportional to the inverse power of the size of the 
system. 

P I 



Figure A-9 Uni-modal Distribution 
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Let {Qi} (/ = 1,2, ..,) be accessible states of a system. These states of {Qi} are 
chosen so that the time evolution defines a Markov process. 
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APPENDIX B EQUATIONS AND SAMPLE CALCULATIONS 


This section provides the equations used in completing the calculations required. 
The tables and data were extracted from a supporting document (Behnke 2001) which 
describes the calculations. This represents example data, not necessarily the functions 
used for the data sets in the final dissertation. For example, the power function is 
explained as opposed to the linear relationship of entropy verse time step. 

1. Entropy Calculation Equations and Example: 

The formula used in this calculation is the following: 

-(probability of tenn usage) * log 2 (probability of term usage) (B. 1) 

Probability of term usage is the cumulative number of a single term’s instances up to the 
given time interval divided by the number of terms instances for all the tenns up to the 
given time interval. The following two tables give an example of the calculation: 

This section provides the equations used in completing the calculations required. 


-299 - 



Entropy Calculation Equations and Example: 


The formula used in this calculation is the following: 

-(probability of tenn usage) * log 2 (probability of term usage) (B.2) 

Probability of term usage is the cumulative number of a single term’s instances up 
to the given time interval divided by the number of tenns instances for all the tenns up to 
the given time interval. The following two tables give an example of the calculation: 


Term 

1989 

1990 

1991 

A - # of instances 

2 

3 

5 

B - # of instances 

1 

5 

18 

Local sum 

3 

8 

23 

Cumulative sum 

3 

11 

34 


Table B-l Sample Calculation Data 


Term 




Entropy of A 

-(2/3)* log 2 (2/3) 

= 0.3900 

-(5/11) * log 2 (5/11) 

= 0.5170 

-(10/34)* logo (10/34) 

= 0.5193 

Entropy of B 

-(1/3) * log 2 (1/3) 

= 0.5283 

-(6/11) * log 2 (6/11) 

= 0.4770 

-(24/34) * logo (24/34) 

= 0.3547 

Cumulative entropy 

0.9183 (a + b) 

0.9940 

0.8740 


Table B-2 Sample Calculation Equation with Data 


2. Predicted Entropy Calculation 

The predicted entropy value for a time interval is calculated using the trend-line 
power equation (the least squares fit through points): 

y = CX b (B.3) 

where c and b are constants. The time interval replaces x. An example from the 

Ada dataset follows in Table B3. 

Percent Error of Actual vs. Predicted 
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Percent error of actual vs. predicted is calculated using the fonnula below and an 
example follows in Table B3. 

^ Predicted - Actual 

Error =- (B.4) 

Actual 


Time T 

Slice 

Actual: 

Cum Entropy 

Error (Act vs. 
Pred) 

y = 

4.7404x a .1489 

Predicted: 

5 years of data 

1 

1979 

4.48385619 

5.71% 

4.48385619 

2 

1980 

5.406900167 

-2.86% 

5.406900167 

3 

1981 

5.805013635 

-3.93% 

5.805013635 

4 

1982 

6.057909749 

-3.94% 

6.057909749 

5 

1983 

6.082181413 

-1.11% 

6.082181413 

6 

1984 

6.106601538 

1.19% 

6.179377493 

7 

1985 

6.355700897 

-0.53% 

6.321976128 

8 

1986 

6.52682382 

-1.21% 

6.44815784 

9 

1987 

6.549095131 

0.19% 

6.561546835 

10 

1988 

6.611519798 

0.80% 

6.664665264 


Table B-3 Predicted Entropy Calculation and Error Example 

I 

Note: First 5 intervals under the predicted column are copied from actual. 


3. Time Interval Derivative Calculation 

The Du(T) and Du_(T-c) calculations are based on the derivative of the trend¬ 
line’s equation from the cumulative entropy graph. The derivative of the trend-line’s 
equation is taken and then the time interval replaces x. The following is the equation 
used: 


— [> , = cx 6 ] = cbx ih n (1.4) 

dx L J 

A usage example from the Ada dataset follows in Table B4. 


Table B-4 
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Time T 





1 

0.70152 




2 

0.388653426 

0.70152 



3 

0.275126706 

0.388653426 

0.70152 


4 

0.215320284 

0.275126706 

0.388653426 


5 

0.178040011 

0.215320284 

0.275126706 


6 

0.152424645 

0.178040011 

0.215320284 

0.70152 

7 

0.133664638 

0.152424645 

0.178040011 

0.388653426 

8 

0.11929092 

0.133664638 

0.152424645 

0.275126706 

9 

0.107900992 

0.11929092 

0.133664638 

0.215320284 

10 

0.098637046 

0.107900992 

0.11929092 

0.178040011 

11 

0.090943882 

0.098637046 

0.107900992 

0.152424645 

12 

0.084445719 

0.090943882 

0.098637046 

0.133664638 

13 

0.078878805 

0.084445719 

0.090943882 

0.11929092 

14 

0.074052371 

0.078878805 

0.084445719 

0.107900992 

15 

0.069824897 

0.074052371 

0.078878805 

0.098637046 


Table B-4 Time Interval Derivative Calculation Example 

Note. y = 4.7404x 0 - 1489 


du(T) = (4.740*0.148)*T A (0.148-1) 

du(T-c) = (4.740*0.148)*(T-c) A (0.148-1) 


4. Lambda Calculation 

The lambda calculation is dependent on the time interval derivative calculations. 
The equation to calculate lambda is: 


4 , = (A/W 3 


(B.5) 


Where f'(x ) is substituted with: 


f\x 0 ) = 


dy 

dx 


du 


(t-c) 


dt 


P 


du 


(t-c) 


dt 


+ - 


du 


(0 


dt 


substituting for f'{x) we get: 


(B.6) 
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K 


du 


vl/3 


P- 


(t~c) 

dt 


P 


du. 


du 


dt 


- + - 


(0 


dt 


(B.7) 


The values from the time interval derivative equation (1.4) are placed into 
(1.7) with varying /? values (e.g. 0.1, 0.2, 0.5, 0.75). Table B5 shows an example of the 
lambda calculation. 


Time T 

Cum 

Entropy 

Du (T) 

C y 10% 

Lambda P 

10% y 

P 10% y 

1 

4.48385619 

0.70152 




2 

5.406900167 

0.388653426 

4.87216694 

0.534733226 

0.1 

3 

5.805013635 

0.275126706 

5.306648158 

0.498365477 

0.1 

4 

6.057909749 

0.215320284 

5.574025253 

0.483884497 

0.1 

5 

6.082181413 

0.178040011 

5.60612135 

0.476060063 

0.1 

6 

6.106601538 

0.152424645 

5.635448868 

0.47115267 

0.1 

7 

6.355700897 

0.133664638 

5.887915573 

0.467785324 

0.1 

8 

6.52682382 

0.11929092 

6.061493127 

0.465330693 

0.1 

9 

6.549095131 

0.107900992 

6.085633441 

0.46346169 

0.1 

10 

6.611519798 

0.098637046 

6.14952886 

0.461990938 

0.1 

11 

6.64290191 

0.090943882 

6.182098576 

0.460803334 

0.1 

12 

6.725985485 

0.084445719 

6.266161229 

0.459824255 

0.1 


Table B-5 Lambda Calculation Example 


Note. “C_y_10%” is found from “Cum Entropy” minus “Lambda_ P 10%_y” 

5. Lyaponuv Exponent Calculation 

The Lyaponuv exponent calculation depends on the trend-line equation from the 
map of entropy at time steps k and k+1. The derivative is taken the same way as in 
equation (1.4). Once the derivative is found the time interval is replaced for x. A usage 
example from the Ada dataset follows in Table B6. 


303 




Time T 

Cum Entropy K 

Cum K+l 

Lyaponuv Exp J'(k,k+1) = 
0.724* 1.720 k A (0.724-l) 

1 

4.48385619 

5.406900167 

0.823021444 

2 

5.406900167 

5.805013635 

0.781579695 

3 

5.805013635 

6.057909749 

0.766403215 

4 

6.057909749 

6.082181413 

0.757435961 

5 

6.082181413 

6.106601538 

0.756600505 

6 

6.106601538 

6.355700897 

0.755764222 

7 

6.355700897 

6.52682382 

0.74747023 

8 

6.52682382 

6.549095131 

0.742009203 

9 

6.549095131 

6.611519798 

0.741311905 

10 

6.611519798 

6.64290191 

0.739373453 

11 

6.64290191 

6.725985485 

0.738407755 

12 

6.725985485 

6.817516503 

0.735878946 


Table B-6 Lyapunov Exponent Calculation Example 

Note, y = 1.7208x 0 ' 7241 

dx = 0.724* 1.720)K (0 - 724 ' 1) 
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APPENDIX C SAMPLE DATA 


Sample data is on the CD under the directory labeled 

YEntropy data analysis\ 
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APPENDIX D TECH OASIS INTERFACE SOURCE CODE 


This section contains the source code of the Tech OASIS interface. This source 
was written by Matt Behnke as partial contribution to his Masters Degree in Software 
Engineering in support of Dr. Michael S. Saboe. They can be reached at 
saboem@tacom.army.mil and behkneM@tacom.army.mil . 

The source code is on the CD under the YEntropy data analysis\ directory 


Script: cumEntropy.tmf 
Author: Matt Behnke 
Created: 9/10/01 

Description: Tech OASIS script that prompts the user to select the data 
field and time field to use in calculating the cumulative entropy. 

The script exports the co-occurrence matrix of the two fields into 
Microsoft Excel and then calls an excel macro to finish the manipulation 
of the raw data to create a summary and graphs. 


Option Explicit 
'declare variables 

dim nStatus, strDataField, strTimeField, arrayGroupNames 
dim exApp, strView, strDirectoryPath 

'prompt for and get user input (Rl.l, R1.2) 
msgbox("Select data field to compute entropy on ") 
nStatus = Dataset.PromptForField(strDataField) 
msgbox("Select time field that contains intervals as groups ") 
nStatus = Dataset.PromptForField(strTimeField) 

'check to make sure there are groups inside the user selected time field (R1.3) 
nStatus = Dataset.GetGroupNames(strTimeField, arrayGroupNames) 
If(IsArray(arrayGroupNames)) Then 

Else 

MsgBox("There are no Groups in the time field! Program ending...") 
Stop 

End if 

'Open Excel Workbook (R1.5) 

Set exApp = CreateObject("Excel.Application") 
exApp.Visible = True 
exApp. Workbooks. Add 
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call createMatrix(strDataField, strTimeField) 
call runExcelMacro 


Function: createMatrix 
Author: Matt Behnke 
Created: 9/10/01 

Description: 1) Creates the co-occurrence matrix of data field (rows) X time 
field (columns) (R1.4) 

2) Exports this matrix to the opened excel file. (R1.6) 

Inputs: strDataField - user selected data field 
strTimeField - user selected time field 
Outputs: none 


Sub createMatrix(strDataField, strTimeField) 

'create and sort matrix 
nStatus 

View.CreateMatrix(strDataField,"UNGROUPED",strTimeField,"GROUPED","COOCCURENCE" ,strVie 
w) 

nStatus=Matrix.Sort("ROW",2,"DESCEND") 

nstatus=Matrix. Select All() 
nstatus=Matrix. Copy Selection)) 

'paste into excel 
exApp.ActiveSheet.Paste 

end sub 


Function: runExcelMacro 
Author: Matt Behnke 
Created: 9/10/01 

Description: Calls the excel macro "Cumulative" inside "cumEntropy.xls" 
located in the vantagepoint (Tech OASIS) macros directory. 

The macro finishes the calculation of cumulative entropy. (R1.7) 
Inputs: none 
Outputs: none 


Sub runExcelMacro() 

nStatus=App.GetPath(strDirectoryPath) 
strDirectoryPath=strDirectoryPath & "Macros/" 
exApp.WorkBooks.Open(strDirectoryPath & "cumEntropy") 
exApp.Windows(2). Activate 

exApp.Application.Run "cumEntropy.xlsICumulative" 

exApp.visible=true 

exApp.WorkBooks(2).Close 

end sub 
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APPENDIX E DATA ANALYSIS SOURCE CODE 


This source was written by Matt Behnke as partial contribution to his Masters 
Degree in Software Engineering in support of Dr. Michael S. Saboe. They can be 
reached at saboem@tacom.anny.mil and behkneM@tacom.army.mil . 

The source code is on the CD under the YEntropy data analysis\ directory 

This section contains the source code used to complete all of the 
data analysis. '- 

' MACRO: AffiliationMacro 
' Author: Matt Behnke 
' Created: 11/5/01 


'GLOBAL VARIABLES 

Dim technologyName As String 
Dim steplnterval As String 
Dim currFilename As String 
Dim datasheet As String 
Dim descriptorMatrixSheet As String 
Dim descriptorMatrixSheetY As String 
(opposite of X) 

Dim worldEntropySheet As String 
Dim worldEntropySheetY As String 

of X) 

Dim affiliationDescMatrix As String 
'CONSTANTS 

Private Const HYP3FIT As Integer = 0 
Private Const EXP3 FIT As Integer = 1 
Private Const POW2 FIT As Integer = 2 


'name of the dataset (ada, java, etc) 

'the time between time steps (months, years) 

'the name of the spreadsheet file 
'sheet that contains the matrix of affiliations 
'sheet that contains the matrix of terms (X) 

'sheet that contains the matrix of terms 

'sheet that contains world entropy 

'sheet that contains world entropy y (opposite 

'sheet that associates terms to affiliations 


Sub: DistributeAffiliations 
Author: Matt Behnke 
Created: 11/5/01 

Description: The sub routine that calls all the sub routines for the affiliation distribution 


309 









' inputs: 


' Outputs: 


Sub DistributeAffiliations() 

currFilename = Application. ActiveWorkbook.Name 
datasheet = ActiveSheet.Name 

'sheet name constants 

descriptorMatrixSheet = "descriptordataX" 
descriptorMatrixSheetY = " descrip tordataY" 
worldEntropySheet = "WorldCumulativeEntropyX" 
worldEntropySheetY = "WorldCumulativeEntropyY" 
affiliationDescMatrix = "descriptormatrixaffil" 

technologyName = InputBox("Enter the name of the technology.") 
steplnterval = InputBox("Enter the time between time steps") 

Call formatSheetForPrint 
Call CopyMathCadObj 

'put the cumulative values on the sheets: 

Call CalcCumulative(dataSheet) 'datasheet has the num records each affdation produced over 

time 

Call CalcCumulative(descriptorMatrixSheet) 

Call CalcCumulative(affiliationDescMatrix) 

Call CalcCumulative("Affiliation_authors") 

'determine the num of records in each band 
Call AffiliationDistribution 

'use the summary sheet created by Affiliation_Distribution to graph the distributions of each band: 
Call CopyDistributionGraph 

'compute world entropy (input, output) obsolete 

'Call ComputeEntropy(descriptorMatrixSheet, worldEntropySheet) 

'create descriptor data y sheet from descriptor data x sheet: 

Call CreateDescriptorDataY("descriptor_data_X", "descriptor data Y") 

'compute world entropy sheets x and y (input, output) 

Call ComputeEntropy("descriptor_data_X", "World Cumulative Entropy X", 1) 

Call ComputeEntropy("descriptor_data_Y", "World Cumulative Entropy Y", 2) 

'fills the band stats of the world: 

Call FillBandStats("World") 

'compute nu and psi for the world: 

Call v_calc_v_psi_sheet("World") 
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■BANDS 


'fill the band with the affiliations and their number of publications that fit the number of 
'publications range for that band determined by Affiliation_Distribution: 

Call FillBand("A_Band") 

'fill band stats: 

Call FillBandStats("A_Band") 

'create the matrix of affiliation with author instances 
Call FillBandAuthors("A_Band") 

'calculate nu and psi: 

Call v_calc_v_psi_sheet("A_Band") 

'determine the matrix of terms and the number of instances for the band 
Call FillBandTerms(" ABand") 

'compute the entropy of the terms in the band 
Call FillBandTermsEntropy("A_Band") 

'create a summaty of band., num of publications, authors, terms, entropy: 

Call affiliationBandSummary("A Band") 


Call FillBand("B_Band") 

Call FillBandStats("B_Band") 

Call FillBandAuthors("B_Band") 

Call v_calc_v_psi_sheet("B_Band") 

Call FillBandTerms("B_Band") 

Call FillBandTermsEntropy("B_Band") 
Call affiliationBandSummary("B_Band") 


Call FillBand("C_Band") 

Call FillBandStats("C_Band") 

Call FillB andAuthors("C_Band") 

Call v_calc_v_psi_sheet("C_Band") 

Call FillBandTerms("C_Band") 

Call FillBandTermsEntropy("C_Band") 
Call affiliationBandSummary("C_Band") 


Call FillBand("D_Band") 

Call FillBandStats("D_Band") 

Call FillBandAuthors("D_Band") 

Call v_calc_v_psi_sheet("D_Band") 

Call FillB andTerms(" D_B and") 

Call FillBandTermsEntropy("D_Band") 
Call affiliationBandSummary("D_Band") 


Call entropySummary 'for the world 

Call affiliationSummary 'for the world 

Call affiliationSummaryPart2 'copies graphs and computes 
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Call affiliationSummaryPart3 'temp and pressure... 

Call CopyABCDGraph 'copy the abed learning curve graphs 
Call fillMonthsRowTrigger 

Call CopyBandSummaryGraphs("A_Band") 'entropy summary graphs 
Call CopyBandSummaryGraphs("B_Band") 

Call CopyBandSummaryGraphs("C_Band") 

Call CopyBandSummaryGraphs("D_Band") 

Call CopyBandSummaryGraphs("World") 

End Sub 


Sub: AffliationDistribution 
Author: Matt Behnke 
Created: 11/5/01 

Description: figures out the division of bands, and the number of affiliations per band 
inputs: 

Outputs: 


Sub AffiliationDistribution() 

Sheets. Add After—Worksheets(Worksheets.Count) 

numRows = CountRows(dataSheet, 1) 

Sheets/Worksheets.Count). Select 
ActiveSheet.Name = "Distribution" 

Cells(l, 1) = "Statistics" 

Cells(2, l).FormulaRlCl = "Mean" 

Cells(2, 2).Formula = "=AVERAGE(" & datasheet & "!A2:A" & numRows & ")" 
Cells(3, 1) = "Stdev" 

Cells(3, 2).Formula = "=STDEV(" & datasheet & "!A2:A" & numRows & ")" 
Cells(4, 1) = "Sum" 

Cells(4, 2).Formula = "=SUM(" & datasheet & "!A2:A" & numRows & ")" 

Cells(5, 1) = "Count" 

Cells(5, 2).Formula = numRows - 1 

Cells(2, 5).Formula = "Calculate Bands" 

Cells(3, 5).Formula = "Band D" 

Cells(3, 6).Formula = "Band_C" 

Cells(3, 7).Formula = "Band B" 

Cells(3, 8).Formula = "Band A" 

Cells(4, 4).Formula = "from" 

Cells(5, 4).Formula = "to" 

Cells(4, 5).Formula = "0" 'Band D from 
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Cells(5, 5) = "=ROUND(B2+B3,3)" 'band d to 

Cells(4, 6) = "=ROUND(B2+B3,3)" 'band c from 

Cells(5, 6) = "=ROUND(B2+B3*2,3)" 'band c to 


Cells(4, 7) = "=ROUND(B2+B3*2,3)" 'band b from 

Cells(5, 7) = "=ROUND(B2+B3*3,3)" 'band b to 


Cells(4, 8) = "=ROUND(B2+B3*3,3)" 'band a from 


'bin labels 
Cells(7, 1) = "Bin" 
Cells(7, 2) = "Frequency" 


counter = 1 

For i = 1 To Round(Cells(5, 5).Value, 0) 'get bin values for band A 
Cells(7 + i, 1) = i 

Cells(7 + i, 2) = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""=" & i & """)" 
counter = counter + 1 
Next i 


Cells(7 + counter, 1) = Cells(5, 6) 'put in next bin (band c end) 

Cells(l, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<" & Cells(5, 

6 ) &.)" 

Cells(l, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<" & 
Cells(4, 6) &.)" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 
counter = counter + 1 


Cells(7 + counter, 1) = Cells(5, 7) 'band b end 

Cells(l, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<" & Cells(5, 

7) & ..)" 

Cells(l, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<" & 
Cells(4, 7) &.)" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 
counter = counter + 1 

exitlf = False 

If Cells(5, 7) < 15 Then 'add more bins 15-30... 

Cells(7 + counter, 1) = "15" 

Cells(l, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 15"")" 
Cells(l, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<" & 
Cells(4, 8) & """)" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 

If Cells(7 + counter, 2) = 0 Then 

Cells(7 + counter, 1) = "> " & Cells(4, 8) 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", 
"">=" & Cells(4, 8) & """)" 
exitlf = True 
End If 
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counter = counter + 1 


If exitlf = False Then 

Cells(7 + counter, 1) = "20" 

Cells(l, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

20"")" 

Cells( 1, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""< 

16"")" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 

If Cells(7 + counter, 2) = 0 Then 
Cells(7 + counter, 1) = "> 15" 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & 

»! »!»!•>_ 

exitlf = True 
End If 

End If' exitif 
counter = counter + 1 

If exitlf = False Then 

Cells(7 + counter, 1) = "25" 

Cells) 1, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

25 ,M, y 

Cells) 1, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""< 

21 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells) 1, 10)) 

If Cells(7 + counter, 2) = 0 Then 
Cells(7 + counter, 1) = "> 20" 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & 

tt 20"”)" 

exitlf = True 
End If 

End If' exitif 
counter = counter + 1 

If exitlf = False Then 

Cells(7 + counter, 1) = "30" 

Cells) 1, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

30"")" 

Cells) 1, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""< 

26"")" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 

If Cells) 7 + counter, 2) = 0 Then 
Cells(7 + counter, 1) = "> 25" 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & 

»i Mitsj- 2^ ,,n y 

exitlf = True 
End If 

End If' exitif 
counter = counter + 1 


314 



If exitlf = False Then 

Cells(7 + counter, 1) = "> 30" 

Cells(7 + counter, 2) = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""> 

30"")" 

End If 
Else 

If exitlf = False Then 

Cells(7 + counter, 1) = "20" 

Cells) 1, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

20"")" 

Cells) 1, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

15"")" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 

If Cells) 7 + counter, 2) = 0 Then 
Cells(7 + counter, 1) = "> 15" 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & 

exitlf = True 
End If 

End If' exitif 
counter = counter + 1 

If exitlf = False Then 

Cells(7 + counter, 1) = "25" 

Cells) 1, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

25»Mfy! 

Cells) 1, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

20"")" 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 

If Cells(7 + counter, 2) = 0 Then 
Cells(7 + counter, 1) = "> 20" 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & 

tt 20 f,ft ) ff 

exitlf = True 
End If 

End If' exitif 
counter = counter + 1 

If exitlf = False Then 

Cells(7 + counter, 1) = "30" 

Cells) 1, 9).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

30"")" 

Cells) 1, 10).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""<= 

25*Mtyt 

Cells(7 + counter, 2) = Abs(Cells(l, 9) - Cells(l, 10)) 

If Cells) 7 + counter, 2) = 0 Then 
Cells(7 + counter, 1) = "> 25" 

Cells(7 + counter, 2).Formula = "=COUNTIF(" & datasheet & "!A2:A" & numRows & 

m 25'ifiyt 
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exitlf = True 
End If 

End If' exitif 
counter = counter + 1 

If exitlf = False Then 

Cells(7 + counter, 1) = "> 30" 

Cells(7 + counter, 2) = "=COUNTIF(" & datasheet & "!A2:A" & numRows & ", ""> 

30"")" 

End If 
End If 

Call formatSheetForPrint 
End Sub 


Sub: CopyMathCadObj 
Author: Matt Behnke 
Created: 12/5/01 

Description: copies the mathcad onject, for running a curve fit., 
inputs: 

Outputs: 


Sub CopyMathCadObj () 

Windows("AffiliationMacro.xls"). Activate 
Sheets("Mathcad"). Select 

Sheets("Mathcad").Copy Before~Workbooks(currFilename).Sheets(dataSheet) 
' If ActiveSheet.Name = "Mathcad" Then 
' ActiveSheet.Name = "Mathcad_" & band 
' Else 

' MsgBox ("Mathcad sheet rename failed") 

' End If 

End Sub 


Sub: CopyDistributionGraph 
Author: Matt Behnke 
Created: 11/5/01 

Description: copies the distribution graph from the macro sheet into the data spreadsheet 
inputs: 

Outputs: 


Sub CopyDistributionGraphQ 
Application.DisplayAlerts = False 
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numSheets = Sheets.Count 


Windows(" AffiliationMacro.xls"). Activate 
Sheets("Affiliation Distribution Sample").Select 

Sheets("Affiliation Distribution Sample").Copy 

After:=Workbooks(currFilename).Sheets(numSheets) 

ActiveChart. SeriesCollection) 1). Select 

Acti veChart. SeriesCollection( 1 ).XV alues = '-Distribution! R8 C1 :R18C1" 
ActiveChart.SeriesCollection(l). Values = "=Distribution!R8C2:R19C2" 
ActiveChart.ChartTitle.Characters.Text = "Productivity Distribution" & Chr(10) _ 

& technologyName & " (" & steplnterval & ")" 

Application.DisplayAlerts = True 

End Sub 


Sub: ComputeEntropy 
Author: Matt Behnke 
Created: 1/28/02 

Description: Computes the cumulative entropy using the supplied datasheets 
note number of instances must begin at row 2, column 4.. 
inputs: datasheet - matrix of the descriptorData.. Y-axis is the terms, X-axis is timesteps, v is # 
of instances 

time 1, 2, 3, 4. 

terml v v v 
term2 v 

outSheet: name of the sheet that contains the computed entropy. 
theType: 1) s(x|y), 2) s(y|x) 

Outputs: 


Integer) 


Sub ComputeEntropy(ByVal datasheet As String, ByVal outSheet As String, ByVal theType As 


numCols = CountCols(dataSheet, 1) 
numRows = CountRows(dataSheet, 1) 

Sheets.Add After:=Worksheets(Worksheets.Count) 

Sheets)Sheets.Count). Select 
ActiveSheet.Name = outSheet 

Worksheets(outSheet).Move After—Worksheets(dataSheet) 


For i = 1 To numCols 

if i >= 4 And theType = 1 Then 

TotalNumlnstances = Sheets(dataSheet).Cells(numRows + 1, i) 
Elself i >= 4 And theType = 2 Then 

TotalNumlnstances = Sheets(dataSheet).Cells(numRows + 1, 4) 
End If 
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For j = 1 To numRows 


If i >= 4 And j >= 2 Then 

numlnstances = Sheets(dataSheet).Cells(j, i) 

If numlnstances > 0 Then 

entropy = -numlnstances / TotalNumlnstances * (Log( numlnstances / 
TotalNumlnstances) / Log(2)) 

Sheets(outSheet).Cells(j, i) = entropy 
End If 

If j = numRows Then 'put in sum of entropy 

Sheets(outSheet).Cells(j + 1, i) = "=SUM(" & col(i) & "2:" & col(i) & numRows & ")" 
End If 

Else 'copy terms, count, first pub date 

Sheets(outSheet).Cells(j, i).Value = Sheets(dataSheet).Cells(j, i).Value 
End If 
Next j 
Next i 

End Sub 


' Sub: CreateDescriptorDataY 
' Author: Matt Behnke 
' Created: 1/28/02 

' Description: Takes the supplied descriptor data sheet and creates the Y part of the (X,Y) world 

as x increases y decreases., a value decreases on the y sheet when a value increases on 

the y sheet 

' inputs: datasheet - matrix of the descriptorData.. Y-axis is the terms, X-axis is timesteps, v is # 
of instances 

' time 1, 2, 3, 4. 

' terml v v v 

' term2 v 

' outSheet: name of the sheet that contains DescriptorDataY 

' Outputs: 


Sub CreateDescriptorDataY(ByVal datasheet As String, ByVal outSheet As String) 

numCols = CountCols(dataSheet, 1) 
numRows = CountRows(dataSheet, 1) 

Worksheets(dataSheet).Copy After—Worksheets(dataSheet) 

Sheets(dataSheet & " (2)").Select 
ActiveSheet.Name = outSheet 

For i = 2 To numRows 

For j = 4 To numCols 
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numTotallnstances = Sheets(outSheet).Cells(i, 2) 
numlnstances = Sheets(outSheet).Cells(i, j) 

lfj = 4 And Sheets(outSheet).Cells(i, j) > 0 Then 'places the initial value at the end.. 

lastColumn = Sheets(outSheet).Cells(i, j) 

End If 

Sheets(outSheet).Cells(i, j) = numTotallnstances - numlnstances 
lfj = numCols And lastColumn > 0 Then 

Sheets(outSheet).Cells(i, j) = lastColumn 'places the value of first column x into last 

coin Y. 

End If 

If i = numRows Then 'put in sum 

Sheets(outSheet).Cells(i + 1, j) = "=SUM(" & col(j) & "2:" & col(j) & numRows & ")" 
End If 
Next j 

lastColumn = 0 
Next i 

End Sub 'CreateDescriptorDataY 


Sub computeEntropyTest() 

'IT WORKS 

'Call ComputeEntropy("descriptor_data_X", "WorldCumulativeEntropyX", 1) 
Call ComputeEntropyCdescriptor data Y", "World Cumulative Entropy Y", 2) 
'Call CreateDescriptorDataY("descriptor_data_X", "descriptordataY") 

End Sub 


' Sub: FillBand 
' Author: Matt Behnke 
' Created: 11/7/01 

' Description: fills in a bands distribution by copying a row from the list of all the affiliations 
(datasheet) 

' inputs: band name 
! 

' Outputs: 


Sub FillBand(ByVal band As String) 

Sheets. Add After—Worksheets(Worksheets.Count) 

numRows = CountRows(dataSheet, 1) 
Sheets/Worksheets.Count). Select 
currSheetName = ActiveSheet.Name 
Columns("C:C").ColumnWidth = 62.43 
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Select Case band 
Case "A_Band" 

bandFrom = Sheets("Distribution").Cells(4, 8) 
bandTo = 32500 
Case "BBand" 

bandFrom = Sheets("Distribution").Cells(4, 7) 
bandTo = Sheets("Distribution").Cells(5, 7) 

Case "CBand" 

bandFrom = Sheets("Distribution").Cells(4, 6) 
bandTo = Sheets("Distribution").Cells(5, 6) 

Case "DBand" 

bandFrom = Sheets("Distribution").Cells(4, 5) 
bandTo = Sheets("Distribution").Cells(5, 5) 

End Select 

Sheets("" & datasheet & "").Select 
Rows("l:l").Select 
Selection.Copy 
Sheets(currSheetName). Select 
Rows("l:l").Select 
ActiveSheet.Paste 

counter = 2 

For i = 2 To numRows 'copy rows from datasheet into band 

If Sheets(dataSheet).Cells(i, 1) >= bandFrom And Sheets(dataSheet).Cells(i, 1) <= bandTo 

Then 

Sheets(dataSheet). Select 
Rows(i & & i).Select 

Selection.Copy 
Sheets(currSheetName). Select 
Rows(counter & & counter).Select 

ActiveSheet.Paste 
counter = counter + 1 
End If 
Next i 

numRowslnBand = CountRows(currSheetName, 1) 
numColumns = CountCols(currSheetName, 1) 'num time steps 

Cells(numRowsInBand +1,3) = "Count" 

Cells(numRowsInBand + 2, 3) = "Mean" 

Cells(numRowsInBand + 3,3) = "Std Dev" 

Cells(numRowsInBand + 4, 3) = "Sum" 

For i = 4 To numColumns 'put in the mean and std deviation for each time step 
'put in zeros if nothing there 
' For j = 2 To numRowslnBand 
' If i = 4 Then 

' If Cells(j, i) > 0 Then 
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' Else 

' Cells(j, i) = 0 

' End If 

' Else 

' Cells(j, i) = Cells(j, i) + Cells(j, i - 1) 

' End If 

' Nextj 

'dont put in zeros if nothing there 
' For j = 2 To numRowsInBand 

' If (Cells(j, i) > 0 And i > 4) Or (i > 4 And Cells(j, i - 1) > 0) Then 
' Cells(j, i) = Cells(j, i) + Cells(j, i - 1) 

' End If 

' Nextj 

Cells(numRowslnBand + 4, i).Formula = "=Sum(" & col(i) & "2:" & col(i) & 
numRowsInBand & ")" 

Cells(numRowslnBand + 1, i).Formula = "=Countif(" & col(i) & "2:" & col(i) & 
numRowsInBand & ", "">0"")" 

If Cells(numRowslnBand + 1, i) > 0 Then 

Cells(numRowslnBand + 2, ij.Formula = "=AVERAGE(" & col(i) & "2:" & col(i) & 
numRowsInBand & ")" 

If Cells(numRowslnBand + 1, i) > 1 Then 'more than one so comput std deviation 

Cells(numRowsInBand + 3, ij.Formula = "=STDEV(" & col(i) & "2:" & col(i) & 
numRowsInBand & ")" 

End If 
End If 
Next i 

ActiveSheet.Name = "Affiliation Cum Dist " & band 
Call formatSheetForPrint 

End Sub 


Sub: FillBandStats 
Author: Matt Behnke 
Created: 11/7/01 
revised: 12/3/01 

Description: creates a band's statistics sheet 
inputs: band name 

Outputs: 


Sub FillBandStats(ByVal band As String) 

Dim data As Variant 

Sheets. Add After:=Worksheets(Worksheets.Count) 
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If band = "World" Then 
source = datasheet 
Else 

source = "AffiliationCumDist" & band 
End If 

numRowsInBand = CountRows(source, 1) 
numTimeStepsInBand = CountCols(source, 1) - 3 
Sheets(Worksheets.Count). Select 
'Columns("C:C").ColumnWidth = 62.43 

Cells(5, 1) = " " 

Cells(6, 1) = " " 

Cells(7, 1) = "" 

Cells(8, 1) = " " 

Cells(9, 1) = " " 

Cells(l 1, 1) = " " 

Cells(12, 1) = " " 


Cells(l, 1) = "Curve fit y(t) y(t) = bt A m" 
Cells(3, 1) = "b" 

Cells(4, 1) = "m" 

Cells(8, 3) = "Total Production" 

Cells(8, 6) = "Production/Step (on Average)" 
Cells(8, 11) = "Calculated Production/Step)" 

Cells(10, 1) = "Time Step" 

Cells(10, 2) = "Step Name" 

Cells(10, 3) = "Mean" 

Cells(10, 4) = "Std Deviation" 

Cells(10, 5) = "'+3 sigma" 

'average per step 
Cells(10, 6) = "Mean" 

Cells(10, 7) = "Std Deviation" 

Cells(10, 8) = "'+3 sigma" 

Cells(10, 9) = "Total Prod" 

Cells(10, 10) = "kappa" 

Cells(10, 11) = "kappa/2" 

Cells(10, 12) = "rvalue" 

Cells(10, 13) = "Mean" 

Cells(10, 14) = "R A 2" 

Cells(10, 15) = "'+3 sigma" 


Cells(13, 1) = "0" 'time step zero 
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For i = 1 To numTimeStepsInBand 
Cells(13 + i, 1) = i 'step number 

Cells(13 + i, 2) = Sheets(source).Cells(l, i + 3) 'step name 
'Total Production 

Cells(13 + i, 3) = Sheets(source).Cells(numRowslnBand + 2, i + 3) 'mean 
Cells(13 + i, 4) = Sheets(source).Cells(numRowslnBand + 3, i + 3) 'stdev 
Cells(13 + i, 5) = Cells(13 + i, 3) + 3 * Cells(13 + i, 4) 'mean + 3std 
'Production per step on avg.. 

If i = 1 Then 

'Cells(13 + i, 6) = Cells(13 + i, 3) / Cells(13 + i, 1) 'mean 
Cells( 13 + i, 6) = Cells(13 + i, 3) 

Cells(13 + i, 7) = Cells(13 + i, 4) 'stdev 
Cells( 13 + i, 8) = Cells(13 + i, 5) 'mean * 3std 
Else 

Cells(13 + i, 6) = Cells(13 + i, 3) - Cells(12 + i, 3) 

Cells(13 + i, 7) = Cells(13 + i, 4) - Cells(12 + i, 4) 'stdev 
Cells(13 + i, 8) = Cells(13 + i, 5) - Cells(12 + i, 5) 'mean + 3std 
End If 

If i = numTimeStepsInBand Then 'put in average 

Cells(14 + i, 8) = "=AVERAGE(H14:H" & i + 13 & ")" 

End If 

If i = 1 Then 

Cells( 13 + i, 9) = Cells(13 + i, 8) 

Else 

Cells( 13 + i, 9) = Cells(13 + i, 8) + Cells(12 + i, 9) 

End If 

Cells(13 + i, 10) = "=H" & 14 + numTimeStepsInBand 'avg ofmean*3std 
Next i 

ActiveSheet.Name = "" & band & "_Stats" 

Call copyStatGraphs(numTimeStepslnBand, band, "" &band & " Stats") 

SheetsC" & band & "_Stats").Select 

'get formula of trendline from entropy power trend graph 
trendEq = Cells(2, 1) 

Cells(3, 2) = firstPartTrendEq)trendEq) 

Cells(4, 2) = secondPartTrendEq)trendEq) 

'get kappa, r. p 

Cells(3, 3) = "kappa" 'headers 
Cells(4, 3) = "r" 

Cells(5, 3) = "p" 

Cells(6, 3) = "l-Sum(r A 2)" 
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j = 14 'get the start row 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

'j = j + 1 'add one to the starting row to not include the first time step.... 
numRowsToUse = numTimeStepslnBand - (j - 14) 

data = Update_Mathcad_Band_Stats("Mathcad", ActiveSheet.Name, "C" & j, "F" & j, 
numRowsToUse, 0, 0.001) 

kappa = data(l) 
r = data(2) 
p = data(3) 
r2a = data(4) 

'put on sheet 

Cells(3, 4) = Round(kappa, 4) 

Cells(4, 4) = Round(r, 4) 

Cells(5, 4) = Round(p, 4) 

Cells(6, 5) = Round(r2a, 4) 

'calculate prediticded means for -1, -2 under total 
'Cells(l 1, 3).Formula = "=-$B$3*-Al 1 A $B$4" 'not needed 
'Cells( 12, 3).Formula = "=-$B$3*-A12 A $B$4" 

Cells(13, 3) = 0 


For i = 1 To numTimeStepslnBand 
'fill in kappa, kappa/2, rvalue 
Cells(13 + i, 10) = "=$D$3" 

Cells(13 + i, 11) = "=$D$3 / 2" 

Cells(13 + i, 12) = "=$D$4" 

'fill in calculated prod per step 

Cells(13 + i, 13).Formula = "=$D$3*(C" & 13 + i & "+$D$5)/(C" & 13 + i & 
"+$D$4+$D$5)" 'mean 

Cells(13 + i, 14) = "=(M" & 13 + i & "-F" & 13 + i & ")*(M" & 13 + i & "-F" & 13 + i & ")" 

'R A 2 

sumRSquared = Cells(13 + i, 14) + sumRSquared 

Cells(13 + i, 15) = "=$D$3*(E" & 13 + i & "+$D$5)/(E" & 13 + i & "+$D$4+$D$5)" 

if i = numTimeStepslnBand Then 'put in average R A 2 - REMOVE AFTER dbl 

Checking values. 

Cells(15 + i, 10) = "Sum(R A 2)" 

Cells(15 + i, 12) = "=Sum(N14:N" & i + 13 & ")" 

Cells( 16 + i, 12) = "=1-L" & i + 15 
End If 
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Next i 


inverseRSquared = (1 - sumRSquared) 'from the sum of r squared 
Cells(6, 4) = Round(inverseRSquared, 4) '4decimal places 

' Cells(ll, 11).Formula = "=$D$3*(C11+$D$5)/(C11+$D$4+$D$5)" -removed (-2, -1, 0 time 
steps of calculated mean) 

' Cells(12, 11).Formula = "=$D$3*(C12+$D$5)/(C12+$D$4+$D$5)" 

' Cells(13, 11) = "=$D$3 *(C 13+$D$5)/(Cl 3+$D$4+$D$5)" 

Call formatSheetForPrint 

Call copyLearningCum(numTimeStepslnBand, band, "" & band & "_Stats") 'learning vs. cum 
End Sub 'fill stats 


Sub: copyStatGraphs 
Author: Matt Behnke 
Created: 11/7/01 

Description: copyies the affilaition statistics graphs 
inputs: 

Outputs: 


Sub copyStatGraphs(ByVal timeSteps As Integer, ByVal band As String, ByVal source As String) 

Application.DisplayAlerts = False 

numSheets = Sheets.Count 

WindowsCAffiliationMacro.xls"). Activate 
Sheets("A_Band_Learning_Cap_per_k"). Select 
Sheets/" A_B and_Learning_Cap_per_k"). Copy 
After:=Workbooks(currFilename).Sheets(numSheets) 

ActiveChart. SeriesCollection/1). Select 

j = 14 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets(source).Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C6:R" & timeSteps + 13 

& "C6" 


ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & j + 1 & "C8:R" & timeSteps 
+ 13 & "C8" 
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ActiveChart.ChartTitle.Characters.Text = "" & band & " Productivity Index (Cum over k)" & 

Chr(10) _ 

& technologyName & " (" & steplnterval & ")" 

ActiveSheet.Name = "" & band & "_Learning_Cap_per_k" 

ActiveChart.SeriesCollection(l).ErrorBars. Select 
ExecuteExceMMacro _ 

"ERRORBAR.Y(2,5,""=" & source & "!R" & j & "C7:R" & timeSteps + 13 & 
"C7"",""=A_Band_Stats!$F$" & j & ":$F$" & timeSteps + 13 &.)" 

'move legend and textbox 
Acti veChart. Legend. Select 
Selection.Left = 431 
Selection.Top = 341 

'copy second graph 

Windows(" AffiliationMacro.xls"). Activate 
Sheets("A_Band_Learning_Cum"). Select 

Sheets("A_Band_Learning_Cum").Copy After—Workbooks! currFilename).Sheets(numSheets) 
ActiveChart. SeriesCollection( 1). Select 

ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C1:R" & timeSteps + 

13 & "Cl" 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C3:R" & timeSteps + 13 

& "C3" 


ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & j & "C1:R" & timeSteps + 

13 & "Cl" 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & j & "C5:R" & timeSteps + 13 

& "C5" 


ActiveChart. SeriesCollection( 1). Select 
With ActiveChart.SeriesCollection( 1 ).Trendlines( 1) 

'put trendline equation onto stats sheet 
Worksheets(source).Cells(2, 1). Value = .DataLabel.Text 
.DisplayRSquared = True 
End With 


ActiveChart.ChartTitle.Characters.Text = "" & band & " Productivity In Pubs (Cum over k)" & 

Chr(10) _ 

& technologyName & " (" & steplnterval & ")" 

ActiveSheet.Name = "" & band & "LearningCum" 

Application.DisplayAlerts = True 

End Sub 
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' copies the learning vs cumulative chart. 


String) 


Sub copyLearningCum(ByVal timeSteps As Integer, ByVal band As String, ByVal source As 


Application.DisplayAlerts = False 

kappa = Sheets(source).Cells(3, 4) 
r = Sheets(source).Cells(4, 4) 
p = Sheets(source).Cells(5, 4) 
r2 = Sheets(source).Cells(6, 4) 

numSheets = Sheets.Count 

Windows(" AffiliationMacro.xls"). Activate 
Sheets("A_Band_Learning_Vs_Cum"). Select 

Sheets(" A_B and Learning Vs_Cum"). Copy After: =Workbooks( currFilename 
). Sheets(numSheets) 

ActiveChart.PlotArea. Select 


j = 14 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets(source).Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 


ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C3:R" & timeSteps + 

13 & "C3" 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C6:R" & timeSteps + 13 

& "C6" 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & j & "C3:R" & timeSteps + 

13 & "C3" 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R14C13:R" & timeSteps + 13 & 

"03" 

'kappa 

ActiveChart.SeriesCollection(3).XValues = "=" & source & "!R" & j & "C5:R" & timeSteps + 

13 & "C5" 

ActiveChart.SeriesCollection(3).Values = "=" & source & "!R14C10:R" & timeSteps + 13 & 

"CIO" 

'3 sigma 3 sigma 
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ActiveChart.SeriesCollection(4).XValues = "=" & source & "!R" & j & "C5:R" & timeSteps + 
13 & "C5" 'E 

ActiveChart.SeriesCollection(4).Values = "=" & source & "!R14C8:R" & timeSteps + 13 & 

"C8" 'H 

'kappa/2 

ActiveChart.SeriesCollection(5).XValues = "=" & source & "!R" & j & "C5:R" & timeSteps + 

13 & "C5" 

ActiveChart.SeriesCollection(5).Values = "=" & source & "!R14C11:R" & timeSteps + 13 & 

"Cll" 


'r-p 

ActiveChart.SeriesCollection(6).XValues = "=" & source & "!R" & j & "C12:R" & timeSteps + 
13 & "02" 

ActiveChart.SeriesCollection(6).Values = "=" & source & "!R14C6:R" & timeSteps + 13 & 

"C6" 


' ActiveChart.SeriesCollection(l).ErrorBars. Select 
' ExecuteExcel4Macro _ 

' "ERRORBAR.Y(2,5,""=" & source & "!R" & j & "C7:R" & timeSteps + 13 & 

"C7"",""=A_Band_Stats!$F$" & j & ":$F$" & timeSteps + 13 & """)" 

ActiveChart.Shapes("Text Box 6").Select 

Selection.Characters.Text = "K= " & kappa & Chr(lO) & "r= " & r & Chr(lO) & "p= " & p & 
Chr(lO) & "" & Chr(lO) & "R2= " & r2 & Chr(lO) & "" & Chr(lO) & "" 

ActiveChart.ChartTitle.Characters.Text = "Learning Curve -- " & band & " (Mean and 
Capacity)" & Chr(lO) _ 

& technologyName & " (" & steplnterval & ")" 

ActiveSheet.Name = "" & band & " Learning Vs Cum" 

Application.DisplayAlerts = True 


data.. 


End Sub 


Sub: CopyABCDGraph 
Author: Matt Behnke 
Created: 11/7/01 

Description: copies the ABCD band mean graph and changes the dataseries to point to the right 
inputs: 

Outputs: 


Sub CopyABCDGraph() 


Application.DisplayAlerts = False 


kappa = Sheets("A_Band_Stats").Cells(3, 4) 
r = Sheets/"A_Band_Stats").Cells(4, 4) 
p = Sheets/" A_Band_Stats").Cells(5, 4) 
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r2 = Sheets)" A_Band_Stats").Cells(6, 4) 


WindowsCAffiliationMacro.xls"). Activate 
Sheets)" ABCD_Band_Learning_Vs_Cum"). Select 
Sheets)" ABCD_Band_Learning_Vs_Cum").Copy After—Workbooks) _ 
currF ilename). Sheets(Sheets. Count) 

ActiveChart.PlotArea. Select 
' ActiveChart. SeriesCollection(3).Delete 
currChartName = ActiveChart.Name 

timeSteps = CountCols("Affiliation_Cum_Dist_A_Band", 1) - 3 
'aband 

j = 14 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets("A_Band_Stats").Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

Charts(currChartName). Select 
'aband mean 

ActiveChart.SeriesCollection(l).XValues = "=A_Band_Stats!R" & j & "C3:R" & timeSteps + 

13 & "C3" 

ActiveChart.SeriesCollection(l).Values = "=A_Band_Stats!R" & j & "C6:R" & timeSteps + 13 

& "C6" 


'calc y 

ActiveChart.SeriesCollection(2).XValues = "=A_Band_Stats!R" & j + 1 & "C3:R" & timeSteps 
+ 13 & "C3" 

ActiveChart.SeriesCollection(2).Values = "=A_Band_Stats!R" & j + 1 & "C13:R" & timeSteps 
+ 13 & "Cl3" 

'3 sigma 3 sigma 

ActiveChart.SeriesCollection(3).XValues = "=A_Band_Stats!R" & j + 1 & "C5:R" & timeSteps 
+ 13 & "C5" 

ActiveChart.SeriesCollection(3).Values = "=A_Band_Stats!R" & j & "C15:R" & timeSteps + 
13 & "Cl5" 


'aband kappa 

ActiveChart.SeriesCollection(7).XValues = "=A_Band_Stats!R" & j & "C5:R" & timeSteps + 

13 & "C5" 

ActiveChart.SeriesCollection(7).Values = "=A_Band_Stats!R" & j & "C10:R" & timeSteps + 
13 & "CIO" 


'aband 3sig 3sig 

ActiveChart.SeriesCollection(8).XValues = "=A_Band_Stats!R" & j & "C5:R" & timeSteps + 

13 & "C5" 

ActiveChart.SeriesCollection(8).Values = "=A_Band_Stats!R" & timeSteps + 14 & "C8" 
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'aband kappa /2 

ActiveChart.SeriesCollection(9).XValues = "=A_Band_Stats!R" & j & "C5:R" & timeSteps + 

13 & "C5" 

ActiveChart.SeriesCollection(9).Values = "=A_Band_Stats!R14Cl 1:R" & timeSteps + 13 & 

"Cll" 

'aband r-p 

ActiveChart.SeriesCollection(10).XValues = "=A_Band_Stats!R" & j & "C12:R" & timeSteps 
+ 13 & "Cl2" 

ActiveChart.SeriesCollection(lO).Values = "=A_Band_Stats!R14C6:R" & timeSteps + 13 & 

"C6" 


'ActiveChart.SeriesCollection(l).ErrorBars. Select 
'ExecuteExcel4Macro _ 

' "ERRORBAR.Y(2,5,""=A_Band_Stats!R" & j & "C7:R" & timeSteps + 13 & 

"C7""=A_Band_Stats!$F$" & j & ":$F$" & timeSteps + 13 & """)" 

'bband **********h= this is CORRECT. 

1 = 14 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets("B_Band_Stats").Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

ActiveChart.SeriesCollection(4).XValues = '-B_Band_Stats!R" & j & "C3:R" & timeSteps + 

13 & "C3" 

ActiveChart.SeriesCollection(4).Values = "=B_Band_Stats!R" & j & "C6:R" & timeSteps + 13 

& "C6" 

'cband 

j = 14 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets("C_Band_Stats").Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

ActiveChart.SeriesCollection(5).XValues = "=C_Band_Stats!R" & j & "C3:R" & timeSteps + 

13 & "C3" 

ActiveChart.SeriesCollection(5).Values = "=C_Band_Stats!R" & j & "C6:R" & timeSteps + 13 

& "C6" 

'dband 

j = 14 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
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i = Sheets("D_Band_Stats").Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

ActiveChart.SeriesCollection(6).XValues = "=D_Band_Stats!R" & j & "C3:R" & timeSteps + 

13 & "C3" 

ActiveChart.SeriesCollection(6).Values = "=D_Band_Stats!R" & j & "C6:R" & timeSteps + 13 

& "C6" 


'kappa textbox 

ActiveChart.Shapes("Text Box 7").Select 

Selection.Characters.Text = "K= " & kappa & Chr(10) & "r= " & r & Chr(10) & "p= " & p & 
Chr(10) & "" & Chr(10) & "R2= " & r2 & Chr(10) & "" & Chr(10) & "" 

ActiveChart.ChartTitle.Characters.Text = ActiveChart.ChartTitle.Characters.Text & Chr(10) _ 
& technologyName & " (" & steplnterval & ")" 

Application.DisplayAlerts = True 

End Sub 


Sub: copyBandSummaryGraphs 
Author: Matt Behnke 
Created: 12/13/01 

Description: Copies the band ENTROPY graphs and published messages summary graphs 
inputs: band name 

Outputs: 


Sub CopyBandSummaryGraphs(ByVal band As String) 

Application.DisplayAlerts = False 

If band = "World" Then 

source = "Affiliation_Summary" 

Else 

source = "Affiliation_Summary_" & band 
End If 

numRows = CountRows(source, 1) 

'GRAPH ONE message_N_k+l vs N_k 

Windows/" AffiliationMacro.xls"). Activate 
Sheets("A_Band_Message_N_k+l vs N_k").Select 
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vs 


N_k").Copy 


Sheets(" A_B and_Message_N_k+1 
Before~Workbooks(currFilename).Sheets(band & " Stats") 

Acti veChart. SeriesCollection( 1). Select 

j =4 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets(source).Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

If Sheets(source).Cells(i, 2).Characters(l, l).Text = "1" And Sheets(source).Cells(i, 
2).Characters(2, l).Text = "/" Then 
yx=j 
Else 

yx=j - 1 
End If 

ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C3:R" & numRows - 1 

& "C3" 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j + 1 & "C3:R" & numRows 

& "C3" 

'y=x: 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & yx & "C3:R" & numRows 

& "C3" 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & yx & "C3:R" & numRows & 

"C3" 


titleBefore = ActiveChart.ChartTitle.Characters.Text 

ActiveChart.ChartTitle.Characters.Text = band & " " & titleBefore & Chr(10) 
& technologyName & " (" & steplnterval & ")" 

'place subscripts in the chart title (N_k+1, N_k) 

If band = "World" Then 

Acti veChart.ChartT itle. Select 

With Selection.Characters(Start:=29, Length:=3).Font 
.Subscript = True 
End With 

With Selection.Characters(Start~34, Length:=l).Font 
.Subscript = True 
End With 
Else 

ActiveChart.ChartT itle. Select 

With Selection.Characters(Start:=30, Length:=3).Font 
.Subscript = True 
End With 

With Selection.Characters(Start~35, Length—1).Font 
.Subscript = True 
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End With 
End If 

ActiveSheet.Name = "" & band & "_Message_N_k+l vs N_k" 

'copy second graph S_k+1 vs S_k 
Windows(" AffiliationMacro.xls"). Activate 
Sheets("A_Band_World_S_k+l vs S_k").Select 

Sheets("A_Band_World_S_k+l vs S_k").Copy Before~Workbooks(currFilename).Sheets(band 

&" Stats") 

Acti veChart. SeriesCollection( 1). Select 

ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C6:R" & numRows - 1 

& "C6" 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j + 1 & "C6:R" & numRows 

& "C6" 

'y=x: 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & yx & "C6:R" & numRows 

& "C6" 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & yx & "C6:R" & numRows & 

"C6" 


titleBefore = ActiveChart.ChartTitle.Characters.Text 

ActiveChart.ChartTitle.Characters.Text = band & " " & titleBefore & Chr(10) 
& technologyName & " (" & steplnterval & ")" 

ActiveSheet.Name = "" & band & "_Entropy_S_k+l vs S_k" 

'place subscripts in the chart title (Entropy S_k+1, S_k) 

If band = "World" Then 

Acti veChart.ChartT itle. Select 

With Selection.Characters(Start~24, Length:=3).Font 
.Subscript = True 
End With 

With Selection.Characters(Start~34, Length~l).Font 
.Subscript = True 
End With 
Else 

ActiveChart.ChartT itle. Select 

With Selection.Characters(Start:=25, Length:=3).Font 
.Subscript = True 
End With 

With Selection.Characters(Start:=35, Length:=l).Font 
.Subscript = True 
End With 
End If 
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If band = "World" Then 


Else 

'copy third Graph S(Y)_k+l vs S_world_k 

Windows(" AffiliationMacro.xls"). Activate 
Sheets("A_Band_S(Y)_k+l Vs S_world_k").Select 

Sheets("A_Band_S(Y)_k+l Vs S_world_k").Copy 

Before~Workbooks(currFilename).Sheets(band & " Stats") 

ActiveChart. SeriesCollection( 1). Select 

ActiveChart.SeriesCollection(l).XValues = "=Affiliation_Summary!R" & j & "C6:R" & 
numRows & "C6" 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j + 1 & "C6:R" & 
numRows & "C6" 

'y=x 

ActiveChart.SeriesCollection(2).XValues = "=Affiliation_Summary!R" & yx & "C6:R" & 
numRows & "C6" 

ActiveChart.SeriesCollection(2).Values = "=Affiliation_Summary!R" & yx & "C6:R" & 
numRows & "C6" 

titleBefore = ActiveChart.ChartTitle.Characters.Text 

ActiveChart.ChartTitle.Characters.Text = band & " " & titleBefore & Chr(10) _ 

& technologyName & " (" & steplnterval & ")" 

'subscripts in chart title 
ActiveChart. ChartT itle. Select 

With Selection.Characters(Start:=24, Length~4).Font 
.Subscript = True 
End With 

With Selection.Characters(Start:=33, Length:=3).Font 
.Subscript = True 
End With 

With Selection.Characters(Start:=39, Length:=5).Font 
.Subscript = True 
End With 

With Selection.Characters(Start~49, Length:=l).Font 
.Subscript = True 
End With 

ActiveSheet.Name = "" & band & "_S(X,Y)_k+l vs S_world_k" 

End If 

Application.DisplayAlerts = True 
End Sub 'copy summary band graphs 


' Sub: FillBandAuthors 
' Author: Matt Behnke 
' Created: 11/7/01 
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Description: fills in a bands author distribution by copying a row from the list of 
affilations with the number of authors as the matrix's values, 
inputs: band name 

Outputs: 


Sub FillBandAuthors(ByVal band As String) 

Sheets.Add After:=Worksheets(Worksheets.Count) 

numRowslnBand = CountRows("Affiliation_Cum_Dist_" &band, 1) 
numRowslnAuthors = CountRows("Affiliation_Authors", 1) 

Sheets(Worksheets.Count). Select 
currSheetName = ActiveSheet.Name 
Columns("C:C").ColumnWidth = 62.43 

Sheets(currSheetName).Move Before:=Sheets("" & band & "_Stats") 

Sheets("Affiliation_Authors"). Select 
Rows(" 1:1"). Select 
Selection. Copy 
Sheets(currSheetName). Select 
Rows(" 1:1"). Select 
ActiveSheet.Paste 

counter = 2 

For i = 2 To numRowslnBand 'copy rows from datasheet into band 

affiliationName = Sheets) "AffiliationCumDist" & band).Cells(i, 3).Value 
For j = 2 To numRowslnAuthors 

If Sheets("Affiliation_Authors").Cells(j, 3).Value = affiliationName Then 
Sheets)" Affiliation_Authors"). Select 
Rows(j & &j).Select 

Selection.Copy 
Sheets(currSheetName). Select 
Rows(counter & & counter).Select 

ActiveSheet.Paste 
counter = counter + 1 
End If 
Next j 
Next i 

numRowsInAuthorBand = CountRows(currSheetName, 1) 
numColumns = CountCols(currSheetName, 1) 'num time steps 

Cells(numRowslnAuthorBand +1,3) = "Count" 

Cells(numRowslnAuthorBand + 2, 3) = "Mean" 

Cells(numRowslnAuthorBand + 3,3) = "Std Dev" 
Cells(numRowslnAuthorBand + 4, 3) = "Sum" 
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For i = 4 To numColumns 'put in the mean and std deviation for each time step 
' For j = 2 To numRowsInAuthorBand 

' If (Cells(j, i) > 0 And i > 4) Or (i > 4 And Cells(j, i - 1) > 0) Then 
' Cells(j, i) = Cells(j, i) + Cells(j, i - 1) 

' End If 
' Nextj 

Cells(numRowslnAuthorBand + 4, i) = "=Sum(" & col(i) & "2:" & col(i) & 
numRowsInAuthorBand & ")" 

'add count, avg, stdev... 

Cells(numRowslnAuthorBand + 1, i).Formula = "=Countif(" & col(i) & "2:" & col(i) & 
numRowsInAuthorBand & ", "">0"")" 

If Cells(numRowsInAuthorBand + 1, i) > 0 Then 

Cells(numRowsInAuthorBand + 2, i).Formula = "=AVERAGE(" & col(i) & "2:" & col(i) 
& numRowsInAuthorBand & ")" 

If Cells(numRowsInAuthorBand + 1, i) > 1 Then 'more than one so comput std deviation 
Cells(numRowsInAuthorBand + 3, i).Formula = "=STDEV(" & col(i) & "2:" & col(i) & 
numRowsInAuthorBand & ")" 

End If 
End If 
Next i 

ActiveSheet.Name = "Aff_Author_Cum_Dist_" & band & "" 

Call formatSheetForPrint 

End Sub 'band authors 


Sub: CalcCumulative 
Author: Matt Behnke 
Created: 11/15/01 

Description: processes the input sheet (a matrix) to calculate the cumulative number of 
instances per time step, 
inputs: sheetName 

Outputs: 


Sub CalcCumulative(ByVal sheetName As String) 

numRows = CountRows(sheetName, 1) 

numCols = CountCols(sheetName, 1) 

Sheets(sheetName). Select 

If sheetName = affiliationDescMatrix Then 
For i = 2 To numRows 
cellSum = 0 
prevSum = 0 
curSum = 0 
For j = 6 To numCols 
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prevSum = Cells(i, j - 1) 
curSum = Cells(i, j) 
cellSum = prevSum + curSum 
If cellSum > 0 Then 
Cells(i, j) = cellSum 
End If 
Next j 
Next i 

t 

' — For the authors matrix zeros must be put in when 

' there is no publication in an instance 

I 

Elself sheetName = "Affiliation authors" Or sheetName = datasheet Then 
For i = 2 To numRows 
cellSum = 0 
prevSum = 0 
curSum = 0 
For j = 4 To numCols 

Ifj > 4 Then 'when not in first column 
prevSum = Cells(i, j - 1) 
curSum = Cellsfi, j) 
cellSum = prevSum + curSum 
Cells(i, j) = cellSum 
Else 'in first column 

If Not Cells(i, j) > 0 Then 
Cells(i, j) = 0 
End If 
End If 

Next j 
Next i 

Else 

For i = 2 To numRows 
cellSum = 0 
prevSum = 0 
curSum = 0 
For j = 5 To numCols 
prevSum = Cellsfi, j - 1) 
curSum = Cellsfi, j) 
cellSum = prevSum + curSum 
If cellSum > 0 Then 
Cellsfi, j) = cellSum 
End If 
Next j 
Next i 

End If 

If sheetName = datasheet Or sheetName = "Affiliation authors" Then 'put count, mean, and 
stdev in each column 

For i = 4 To numCols 'put in the mean and std deviation for each time step 
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Cells(numRows + 4, i).Formula = "=Sum(" & col(i) & "2:" & col(i) & numRows & ")" 

'sum 

Cells(numRows + 1, i).Formula = "=Countif(" & col(i) & "2:" & col(i) & numRows & ", 

""> 0 "")" 

If Cells(numRows + 1, i) > 0 Then 

Cells(numRows + 2, i).Formula = "=AVERAGE(" & col(i) & "2:" & col(i) & numRows 

& ")" 

If Cells(numRows + 1, i) > 1 Then 'more than one so comput std deviation 

Cells(numRows + 3, i).Formula = "=STDEV(" & col(i) & "2:" & col(i) & numRows 

& ")" 

End If 
End If 
Next i 
Else 

'put in the sum of the columns 
For i = 4 To numCols 

Cells(numRows + 1, i).Formula = "=Sum(" & col(i) & "2:" & col(i) & numRows & ")" 
Next i 

End If 'sheetname = datasheet 
End Sub 


Sub: FillBandTerms 
Author: Matt Behnke 
Created: 11/7/01 

Description: fills in the term instances for a band 
inputs: band name 

Outputs: 


Sub FillBandTerms(ByVal band As String) 

Sheets. Add After:=Worksheets(Worksheets.Count) 
counter = 2 

affiliationDescMatrix = "descriptormatrixaffil" 
numRowsInBand = CountRows("Affiliation_Cum_Dist_" &band, 1) 
numColumnsInTerms = CountCols(affiliationDescMatrix, 1) 

Sheets/Worksheets.Count). Select 
currSheetName = ActiveSheet.Name 
Columns("C:C").ColumnWidth = 32.43 

Sheets(currSheetName).Move Before:=Sheets("" & band & "_Stats") 
'header 

Cells(l, 1) = Sheets(affiliationDescMatrix).Cells(l, 1) 

Cells(l, 2) = Sheets(affiliationDescMatrix).Cells(l, 2) 

Cells(l, 3) = Sheets(affiliationDescMatrix).Cells(l, 3) 
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For i = 4 To numColumnslnTerms - 1 'copy time interval header 
Cells(l, i) = Sheets("descriptor_matrix_affil").Cells(l, i + 1) 

Next i 

'fill in the terms and instances... 

For i = 2 To numRowsInBand 'copy rows from datasheet into band 

affiliationName = Sheets("Affiliation_Cum_Dist_" & band).Cells(i, 3).Value 
For j = 2 To CountRows(affiliationDescMatrix, 1) 

If Sheets(affiliationDescMatrix).Cells(j, 4) = affiliationName Then 

rowlnAffiliationDescMatrix = j 

termName = Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, 3) 

'check to see if term exists already on band's list of terms 
termRowlnBand = findStringRowlnSheet(currSheetName, termName, 3) 

If termRowlnBand > 0 Then 


Cells(termRowlnBand, 2) = Cells(termRowInBand, 2) + 

Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, 2) 
cellSum = 0 
prevSum = 0 
curSum = 0 


If Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, 
Cells(termRowInBand, 1) Then 

Cells(termRowlnBand, 1) 

Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, 1). Value 


End If 


1).Value < 


exists 


For z = 4 To numColumnslnTerms 'add the values for each time time to what aleady 


If z > 4 Then 'add cumulative sum of term instances (previous + current + 

numlnstances) 

'prevSum = Cells(termRowInBand, z - 1) 

If Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, z + 1) > 0 

Then 


curSum = Cells(termRowInBand, z) + 

Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, z + 1) 


Else 

curSum = 0 
End If 

'cellSum = prevSum + curSum 
If curSum > 0 Then 

Cells(termRowInBand, z) = curSum 
End If 
Else 
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Then 


If Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, z + 1) > 0 


Cells(termRowInBand, z) = Cells(termRowInBand, z) + 
Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, z + 1) 

End If 
End If 
Next z 

Else 'term not found 

Cells(counter, 1) = Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, 

1 ) 

Cells(counter, 2) = Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, 


2 ) 


Cells(counter, 3) = Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, 


3) 


numlnstances) 


For z = 4 To numColumnsInTerms 

If z > 4 Then 'add cumulative sum of term instances (previous + current + 


'prevSum = Cells(counter, z - 1) 

curSum = Cells(counter, z) + 

Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, z + 1) 

'cellSum = prevSum + curSum 


If curSum > 0 Then 

Cells(counter, z) = curSum 
End If 
Else 

If Sheets(affiliationDescMatrix).Cells(rowInAffiliationDescMatrix, z + 1) > 0 

Then 


Cells(counter, z) = Cells(counter, z) + 

Sheets(affiliationDescMatrix).Cells(rowlnAffiliationDescMatrix, z + 1) 

End If'elimates zeros 
End If'z = 4 
Next z 

counter = counter + 1 
End If 'if-found-else-not 
End If' affiliation name matches 
Next j 
Next i 


numRows = CountRows(ActiveSheet.Name, 1) 
numCols = CountCols(ActiveSheet.Name, 1) 

For i = 4 To numCols 

Cells(numRows + 1, i).Formula = "=Sum(" & col(i) & "2:" & col(i) & numRows & ")" 
Next i 

'Call CalcCumulative(ActiveSheet.Name) 

ActiveSheet.Name = "Term_Dist_" & band & "" 

Call formatSheetForPrint 
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End Sub 'fill band terms 


Sub: FillBandTermsEntropy 
Author: Matt Behnke 
Created: 11/17/01 

Description: computes the entropy of a band's terms. 

and the contribution of the band., 
inputs: band name 

Outputs: 


Sub FillBandTermsEntropy(ByVal band As String) 
Sheets.Add After:=Worksheets(Worksheets.Count) 


numRows = CountRows("Term_Dist_" & band & 1) 

numColumns = CountCols("Term_Dist_" & band & 1) 

Sheets/Worksheets.Count). Select 
currSheetName = ActiveSheet.Name 
Columns)" C:C" ).ColumnWidth = 32.43 

Sheets(currSheefName).Move Before:=Sheets("" & band & "_Stats") 
numRowsWorld = CountRows(descriptorMatrixSheet, 1) 

'copy term distribution sheet for entropy 

Worksheets("Term_Dist_" & band & "").Range("Al:" & col(numColumns) & numRows).Copy 
Destination—Worksheets(currSheetName).Range("Al") 


For i = 2 To numRows 

termName = Sheets(currSheetName).Cells(i, 3) 
termCount = Sheets(currSheetName).Cells(i, 2) 

termRowlnWorldEntropy = fmdStringRowlnSheet(worldEntropySheet, termName, 3) 
termRowInDescriptorMatrix = termRowlnWorldEntropy 

For z = 4 To numColumns 

If Sheets(currSheetName).Cells(i, z).Value >= 1 Then 

termCountlnBandlnStep = Sheets(currSheetName).Cells(i, z) 

sumlnstancesBand = Sheets("Term_Dist_" & band & "").Cells(numRows + 1, z) 
pTerm = termCountlnBandlnStep / sumlnstancesBand 
entropyTerm = -pTerm * (Log(pTerm) / Log(2)) 

Sheets(currSheetName).Cells(i, z) = entropyTerm 

End If 
Next z 
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Next i 


Sheets(currSheetName). Select 
Cells(numRows +1,3) = "Sum" 

Cells(numRows + 2, 3) = "Contribution" 

Cells(numRows + 3, 3) = "Difference" 

For i = 4 To numColumns 

Cells(numRows + 1, i).Formula = "=Sum(" & col(i) & "2:" & col(i) & numRows & ")" 

numlnstancesWorld = Sheets(descriptorMatrixSheet).Cells(numRowsWorld + 1, i) 
numlnstancesBand = Sheets("Term_Dist_" & band & "").Cells(numRows + 1, i) 

If numlnstancesBand > 0 Then 

ratio 1 = numlnstancesWorld / numlnstancesBand 
ratio2 = numlnstancesBand / numlnstancesWorld 
entropySum = Cells(numRows + 1, i) 

contributionOffiand = ratio2 * entropySum + (ratio2 * (Log(ratiol) / Log(2))) 

Cells(numRows + 2, i) = contributionOffiand 

Cells(numRows + 3, i) = Abs(entropySum - contributionOffiand) 

Else 

Cells(numRows + 2, i) = 0 
Cells(numRows + 3, i) = 0 
End If 
Next i 

ActiveSheet.Name = "Term_Entropy_Dist_" & band & "" 

Call formatSheetForPrint 

End Sub 'fill band terms entropy 


Sub: affiliationBandSummary 
Author: Matt Behnke 
Created: 11/30/01 

Description: creates the summary sheet for the band.. 

shows step, num of recors, authors, terms, entropy., 
inputs: band - the name of the band 
Outputs: none 


Sub affiliationBandSummary(ByVal band As String) 

numColumns = CountCols("Term_Entropy_Dist_" & band, 1) 
numRowsAffiliation = CountRows("Affiliation_Cum_Dist_" & band, 1) 
numRowsAuthor = CountRows("Aff_Author_Cum_Dist_" &band, 1) 
numRowsTermDist = CountRows("Term_Dist_" & band, 1) 
numRowsTermEntropy = CountRows("Term_Entropy_Dist_" &band, 1) 

Sheets.Add After:=Worksheets(Worksheets.Count) 

Sheets/Worksheets. Count). Select 
currSheetName = ActiveSheet.Name 
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Sheets(currSheefName).Move Before:=Sheets(band & "_Stats") 

Sheets(currSheetName). Select 
ActiveSheet.StandardWidth = 13 

Cells(l, 1) = " " 

Cells(2, 1) = "" 

Cells(2, 3) = "Instances (Previous + Current)" 

Cells(3, 1) = "Step" 

Cells(3, 2) = "interval" 

Cells(3, 3) = "Records" 

Cells(3, 4) = "Authors" 

Cells(3, 5) = "Terms" 

Cells(3, 6) = "Entropy" 

Cells(3, 7) = "Contribution" 

Cells(3, 8) = "Difference" 

Cells(3, 9) = "Rec/ Author" 

For i = 4 To numColumns 
Cells(i, 1) = i - 3 

Cells(i, 2) = Sheets("Term_Dist_" & band).Cells(l, i) 

Cells(i, 3).Value = "=SUM(Affiliation_Cum_Dist_" & band & "!" & col(i) & "$2:" & col(i) 
& "$" & numRowsAffiliation & ")" 

Cells(i, 4).Value = "=SUM(Aff_Author_Cum_Dist_" & band & "!" & col(i) & "$2:" & col(i) 
& "$" & numRowsAuthor & ")" 

Cells(i, 5).Value = "=SUM(Term_Dist_" & band & "!" & col(i) & "$2:" & col(i) & "$" & 
numRowsTermDist & ")" 

Cells(i, 6) = Sheets("Term_Entropy_Dist_" & band).Cells(numRowsTermEntropy + 1, i) 
Cells(i, 7) = Sheets("Term_Entropy_Dist_" & band).Cells(numRowsTermEntropy + 2, i) 
Cells(i, 8) = Sheets("Term_Entropy_Dist_" & band).Cells(numRowsTermEntropy + 3, i) 

If Cells(i, 4) > 0 Then 

Cells(i, 9) = Cells(i, 3) / Cells(i, 4) 

End If 
Next i 

ActiveSheet.Name = "Affdiation_Summary_" & band 
End Sub 


' Sub: affiliationSummary 
' Author: Matt Behnke 
' Created: 11/30/01 
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added stuff: 2/1/02 

Description: creates the world affilation summary sheet.. 

this is the first part., the second part puts in the temp poly and the pressure equations 

after fill months has been run on the sheet. 

inputs: none 
Outputs: none 


Sub affiliationSummaryO 

numColumns = CountCols(worldEntropySheet, 1) 
numRowsAffiliation = CountRows(dataSheet, 1) 
numRowsAuthor = CountRows("Affiliation_authors", 1) 
numRowsTermDist = CountRows(descriptorMatrixSheet, 1) 
numRowsTermEntropy = CountRows(worldEntropySheet, 1) 

Sheets. Add After—Worksheets(Worksheets.Count) 
Sheets/Worksheets.Count). Select 
currSheetName = ActiveSheet.Name 

Sheets(currSheetName).Move After:=Sheets(Sheets.Count) 

Sheets(currSheetName). Select 
ActiveSheet.StandardWidth = 13 


Cells/1, 1) = " " 

Cells(2, 1) = " " 

Cells(2, 3) = "Instances (Previous + Current)" 

Cells(3, 1) = "Step" 

Cells(3, 2) = "interval" 


Cells(3, 3) = "Records" 

Cells(3, 4) = "Authors (v_X)" 
Cells(3, 5) = "Rec / Author" 

Cells(3, 6) = "Terms X" 

Cells(3, 7) = "Terms Y" 

Cells(3, 8) = "S(X)" 

Cells(3, 9) = "S(Y)" 

Cells(3, 10) = "S(X,Y)" 

Cells(3, 11) = "S(X;Y)" 

Cells(3, 12) = "deltanx" 

Cells(3, 13) = "deltasx" 

Cells(3, 14) = "T_X Saboe Degrees" 
Cells(3, 15) = "deltany" 

Cells(3, 16) = "deltasy" 

Cells(3, 17) = "v_Y_nodes" 

Cells(3, 18) = "pressure n per node" 
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For i = 4 To numColumns 
Cells(i, 1) = i - 3 

Cells(i, 2) = Sheets(dataSheet).Cells(l, i) 

If i > 4 Then 

Cells(i, 3 ). Value = "=SUM(" & datasheet & "!" & col(i) & "$2:" & col(i) & "$" & 
numRowsAffiliation & ")"'+ C" & i - 1 
Else 

Cells(i, 3 ). Value = "=SUM(" & datasheet & "!" & col(i) & "$2:" & col(i) & "$" & 
numRowsAffiliation & ")" 

End If 

If i > 4 Then 

Cells(i, 4).Value = "=SUM(Affiliation_authors!" & col(i) & "$2:" & col(i) & "$" & 
numRowsAuthor & ")" ' + D" & i - 1 
Else 

Cells(i, 4).Value = "=SUM(Affiliation_authors!" & col(i) & "$2:" & col(i) & "$" & 
numRowsAuthor & ")" 

End If 

Cells(i, 5) = Cells(i, 3) / Cells(i, 4) 

Cells(i, 6).Value = "=SUM(" & descriptorMatrixSheet & "!" & col(i) & "$2:" & col(i) & "$" 
& numRowsTermDist & ")" 

Cells(i, 7).Value = "=SUM(" & descriptorMatrixSheetY & "!" & col(i) & "$2:" & col(i) & 
"$" & numRowsTermDist & ")" 

Cells(i, 8) = Sheets(worldEntropySheet).Cells(numRowsTermEntropy + 1, i) 

Cells(i, 9) = Sheets(worldEntropySheetY).Cells(numRowsTermEntropy + 1, i) 

Cells(i, 10) = Sheets)worldEntropySheet).Cells(numRowsTermEntropy + 1, numColumns) 
Cells(i, 11) = "=" & col(8) & i & "+" & col(9) & i & & col(10) & i 'Cells(i, 8) + Cells(i, 

9 ) - Cells(i, 10) 

If i > 4 Then 

Cells(i, 12) = "=" & col(6) & i & & col(6) & i - 1 'cells(i, 6) - cells (i-1,6) delta_n_y 

Cells(i, 13) = "=" & col(9) & i - 1 & & col(9) & i 'cells(i, 9) - cells (i-1,9) delta_s_x 

Cells(i, 14) = "=" & col(12) & i & 7" & col(13) & i 'cells(i, 12) / cells(i, 13) T_X 
Cells(i, 15) = "=" & col(7) & i & & col(7) & i - 1 'cells(i, 7) - cells (i-1,7) delta_n_y 

Cells(i, 16) = "=" & col(8) & i & & col(8) & i -1 'cells(i, 8) - cells(i-l, 8) delta_s_y 

End If 

Cells(i, 17) = Shectst" A fill iation_a Lithors").Cel 1st numRowsAuthor + 1, numColumns) - 

Cells(i, 4) 

Cells(i, 18) = "=" & col(6) & i & 7" & col(4) & i 'cells(i,6) / cells(i,4) terms X / author X 
Next i 

Cells(4, 3). Select 'freeze panes 
ActiveWindow.FreezePanes = True 

ActiveSheet.Name = "AffiliationSummary" 

Call fillMonthsRow("Affiliation_Summary", 4) 
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Call fillMonthsRow("Affiliation_Summary", 4) 

'Call CopyInteractingSystemsGraphs(ActiveSheet.Name, numColumns) 
End Sub 


Sub: affiliationSummaryPart2 
Author: Matt Behnke 
Created: 2/1/02 

Description: after fillmonths has been ran this procedure copies the appropriate graphs 
interactive systems graphs and temp / pressure graphs 
uses the trendline equations from the system graph to calculate 
temp_polynomial and 
the pressure equation 
inputs: none 
Outputs: none 


Sub affiliationSummaryPart2() 

source = "AffiliationSummary" 
numRows = CountRows(source, 1) 

Sheets(source).Cells(3, 19) = "S(X) calculated" 
Sheets(source).Cells(3, 20) = "S(Y) calculated" 
Sheets(source).Cells(3, 21) = "delta S(X) calculated" 
Sheets(source).Cells(3, 22) = "n(X) calculated" 
Sheets(source).Cells(3, 23) = "deltanxcalculated" 
Sheets(source).Cells(3, 24) = "T_X Saboe Deg. Polynomial" 

Call CopylnteractingSystemsGraphs("World") 

trendlineA = Sheets(source).Cells(l, 19) 
sx_a = firstPartTrendEq) trendline A) 
sx_b = secondPartTrendEq) trendline A) 

'sx_a = firstPartPolyTrendEq) trendline A) 

'sx_b = secondPartPolyTrendEq(trendlineA) 

'sx_c = thirdPartPolyTrendEq) trendline A) 

trendlineB = Sheets(source).Cells(l, 20) 
sy_a = firstPartPolyTrendEq) trendlineB) 
sy_b = secondPartPolyTrendEq) trendlineB) 
sy_c = thirdPartPolyTrendEq( trendlineB) 

trendline_nX = Sheets(source).Cells(l, 22) 
nx_a = firstPartTrendEq(trendlinenX) 
nx_b = secondPartTrendEq(trendlinenX) 

For i = 4 To numRows 

k = Sheets(source).Cells(i, 1) 

'sX & sY calculated 
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Sheets(source).Cells(i, 19) = sx_a * k A sx_b 'power equation of entropy 
'Sheets(source).Cells(i, 19) = sx_a * k A 2 + sx_b * k + sx_c 


Sheets(source).Cells(i, 20) : 
'nX calculated 
Sheets(source).Cells(i, 22) : 


sy_a * k A 2 + sy_b * k + sy_c 
nx_a * (k A nx_b) 


If Sheets(source).Cells(i - 1, 6).Font.Colorlndex = 3 Then 
'find the first row of the same value 
x = i - 1 

While Sheets(source).Cells(x, 6).Font.Colorlndex = 3 
x = x - 1 
Wend 

previousSY = Sheets(source).Cells(x, 9) 'S(Y) from previous step 
previousnX = Sheets(source).Cells(x, 6) 'number of terms in previous step 
Else 

previous SY = Sheets(source).Cells(i -1,9) 'S(Y) from previous step 
previous nX = Sheets(source).Cells(i -1,6) 'number of terms in previous step 
End If 

If i > 4 Then 

'check to see if current S(Y) or current n(X) (num terms) is the same as previous 
'if so then place the value of the calculated S(Y) or n(X) into that spot of similarity 
'mark the spot in red where a calculated value has been substituted. 

If Sheets(source).Cells(i, 9) =previous_SY Then 

Sheets(source).Cells(i, 9) = "=" & col(20) & i 'equals calc'ed value of S(Y) 
Sheets(source).Cells(i, 9).Font.ColorIndex = 3 
End If 

If Sheets(source).Cells(i, 6) = previous nX Then 

Sheets(source).Cells(i, 6) = "=" & col(22) & i 'equals calc'ed value of n(X) 
Sheets(source).Cells(i, 6).Font.ColorIndex = 3 
End If 

'delta S(X)_calculated 

Sheets(source).Cells(i, 21) = "=" & col(19) & i & "-" & col(19) & i - 1 'cells(i,19) - cells(i- 
'delta n(X)_calculated 

Sheets(source).Cells(i, 23) = "=" & col(22) & i & "-" & col(22) & i - 1 'cells(i,22) - cells(i- 
't(x)_poly = n(X)/S(X) 

Sheets(source).Cells(i, 24) = "=" & col(23) & i & "/" & col(21) & i 'cells(i,23) / cells(i,21) 
End If 
Next i 

End Sub 


' Sub: affiliationSummaryPart3 
' Author: Matt Behnke 
' Created: 2/4/02 


347 




' Description: copies the temp / pressure graphs uses trendline equations of temp_poly and 
pressure to get the 

' the pressure equation 

' inputs: none 
' Outputs: none 


Sub affiliationSummaryPart3() 

source = "Affiliation_Summary" 
numRows = CountRows(source, 1) 

Sheets(source).Cells(3, 25) = "Press f(T)" 
Sheets(source).Cells(l, 24) = "m_P" 
Sheets(source).Cells(2, 24) = "b_P" 
Sheets(source).Cells(l, 26) = "m_T" 
Sheets(source).Cells(2, 26) = "b_T" 

Call CopyTempPressGraphs)"World") 

trendline_Tpoly = Sheets(source).Cells(l, 27) 
m_t = firstPartTrendEq(trendlineTpoly) 
b_t = secondPartLinearTrendEq(trendlineTpoly) 
Sheets(source).Cells(l, 27) = m_t 
Sheets(source).Cells(2, 27) = b_t 

trendline Press = Sheets(source).Cells(l, 25) 
m_p = firstPartTrendEq(trendlinePress) 
b_p = secondPartLinearTrendEq(trendlinePress) 
Sheets(source).Cells(l, 25) = m_p 
Sheets(source).Cells(2, 25) = b_p 


For i = 5 To numRows 

Tx_poly = Sheets(source).Cells(i, 24) 

Sheets(source).Cells(i, 25) = b_p + (m_p / m_t) * (Tx_poly - b_t) 

Next i 

'copy third Graph World_Press_vs_Temp_Saboe 
Application.Display Alerts = False 

Windows!" AffiliationMacro.xls"). Activate 
Sheets)" World_Press_vs_Temp_Saboe"). Select 
Sheets!WorldPressvsT emp_Saboe"). Copy 
After:=Workbooks(currFilename).Sheets(source) 

Acti veChart. SeriesCollection( 1). Select 

'Pressure per node 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & 5 & "C25:R" & numRows & 

"C25" 
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ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & 5 & "C24:R" & numRows 

& "C24" 

'T(x) poly: 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & 5 & "C18:R" & numRows & 

"Cl 8" 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & 5 & "C24:R" & numRows 

& "C24" 

Application.DisplayAlerts = True 
End Sub 


Sub: copyTempPressGraphs 
Author: Matt Behnke 
Created: 2/4/02 

Description: copies the interacting systems graphs from the affiliation macro workbook, 
inputs: band name 

Outputs: 


Sub CopyTempPressGraphs(ByVal band As String) 

Application.DisplayAlerts = False 

If band = "World" Then 

source = "Affiliation_Summary" 

Else 

source = "Affiliation_Summary_" & band 
End If 

numRows = CountRows(source, 1) 

'GRAPH ONE XY Temp 

Windows(" AffiliationMacro.xls"). Activate 
Sheets("XY_Temp"). Select 

Sheets("XY_Temp").Copy After—Workbooks(currFilename).Sheets(source) 
ActiveChart.SeriesCollection( 1). Select 

j =4 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets(source).Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
Wend 

'X-Category 
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'msgs per node 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C18:R" & numRows & 

"C18" 

ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C1:R" & numRows & 

"Cl" 

't(x) 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & j & "C14:R" & numRows & 

"Cl 4" 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & j & "C1:R" & numRows & 

"Cl" 

't(x) poly 

ActiveChart.SeriesCollection(3).Values = "=" & source & "!R" & j & "C24:R" & numRows & 

"C24" 

ActiveChart.SeriesCollection(3).XValues = "=" & source & "!R" & j & "C18:R" & numRows 

& "08" 

With ActiveChart.SeriesCollection(2).Trendlines( 1) 

'put trendline equation onto stats sheet for T(X)_poly 
.DisplayEquation = True 
.DisplayRSquared = True 

End With 

With ActiveChart.SeriesCollection(3).Trendlines(l) 

'put trendline equation onto stats sheet for T(X)_poly 
.DisplayEquation = True 
.DisplayRSquared = True 

Worksheets(source).Cells(l, 27).Value = .DataLabel.Text 
End With 

'copy second graph XY_Press 
Windows(" AffiliationMacro.xls"). Activate 
Sheets("XY_Press"). Select 

Sheets("XY_Press").Copy After:=Workbooks(currFilename).Sheets(source) 

ActiveChart. SeriesCollection( 1). Select 
'Pressure per node 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C18:R" & numRows & 

"Cl 8" 

ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C1:R" & numRows & 

"Cl" 

'T(x) poly: 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & j & "C24:R" & numRows & 

"C24" 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & j & "C18:R" & numRows 

& "C18" 

With ActiveChart.SeriesCollection(2).Trendlines( 1) 

'put trendline equation onto stats sheet for pressure fit 
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.DisplayEquation = True 
.DisplayRSquared = True 
End With 

With ActiveChart.SeriesCollection( 1 ).Trendlines( 1) 

'put trendline equation onto stats sheet for pressure fit 
.DisplayEquation = True 
.DisplayRSquared = True 

Worksheets(source).Cells(l, 25). Value = .DataLabel.Text 
End With 

Application.DisplayAlerts = True 
End Sub 'copy temp/press graphs 


Sub: copylnteractingSystemsGraphs 
Author: Matt Behnke 
Created: 2/1/02 

Description: copies the interacting systems graphs from the affiliation macro workbook, 
inputs: band name 

Outputs: 


Sub CopylnteractingSystemsGraphs(ByVal band As String) 

Application.DisplayAlerts = False 

If band = "World" Then 

source = "Affiliation_Summary" 

Else 

source = "Affiliation_Summary_" & band 
End If 

numRows = CountRows(source, 1) 

'GRAPH ONE S_2Interacting Systems 

Windows("AffiliationMacro.xls"). Activate 
Sheets/"S_21nteracting systems").Select 

Sheets("S_21nteracting systems").Copy After—Workbooks(currFilename).Sheets(source) 
Acti veChart. SeriesCollection/1). Select 

j =4 

While Not (i > 0#) 'if the first time step's mean is zero find the step that doesnt have 0 
i = Sheets(source).Cells(j, 3).Value 
If Not (i > 0) Then 

j=j + l 

End If 
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Wend 


'X-Category 

'S(Y) 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C9:R" & numRows & 

"C9" 

ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C6:R" & numRows & 

"C6" 

'S(X): 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & j & "C8:R" & numRows & 

"C8" 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & j & "C6:R" & numRows & 

"C6" 

’S(X,Y) 

ActiveChart.SeriesCollection(3).Values = "=" & source & "!R" & j & "C10:R" & numRows & 

"CIO" 

ActiveChart.SeriesCollection(3).XValues = "=" & source & "!R" & j & "C6:R" & numRows & 

"C6" 

'S(X;Y) 

ActiveChart.SeriesCollection(4). Values = "=" & source & "!R" & j & "C11:R" & numRows & 

"Cll" 

ActiveChart.SeriesCollection(4).XValues = "=" & source & "!R" & j & "C6:R" & numRows & 

"C6" 


With ActiveChart.SeriesCollection(l).Trendlines(l) 

'put trendline equation onto stats sheet for S(y) 

.DisplayEquation = True 
.DisplayRSquared = True 

Worksheets(source).Cells(l, 20).Value = .DataLabel.Text 
End With 

With ActiveChart.SeriesCollection(2).Trendlines(l) 

'put trendline equation onto stats sheet for S(x) 

.DisplayEquation = True 
.DisplayRSquared = True 

Worksheets(source).Cells(l, 19).Value = .DataLabel.Text 
End With 

'copy second graph World_(X)Temp_S_2 
Windows(" AffiliationMacro.xls"). Activate 
Sheets("World_(X)Temp_S_2"). Select 

Sheets("World_(X)Temp_S_2").Copy After—Workbooks(currFilename).Sheets(source) 
ActiveChart. SeriesCollection( 1). Select 
'S(X) vs T_X 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C8:R" & numRows & 

"C8" 
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ActiveChart.SeriesCollection(l).XValues = "=" & source & "!R" & j & "C14:R" & numRows 

& "C14" 

’S(X;Y): 

ActiveChart.SeriesCollection(2).Values = "=" & source & "!R" & j & "C11:R" & numRows & 

"Cll" 

ActiveChart.SeriesCollection(2).XValues = "=" & source & "!R" & j & "C6:R" & numRows & 

"C6" 

With ActiveChart.SeriesCollection( 1 ).Trendlines( 1) 

'put trendline equation onto stats sheet for S(y) 

.DisplayEquation = True 
.DisplayRSquared = True 

End With 

With ActiveChart.SeriesCollection(2).Trendlines( 1) 

'put trendline equation onto stats sheet for S(x) 

.DisplayEquation = True 
.DisplayRSquared = True 

End With 

'copy third Graph n_Msg_21nteracting systems 
Windows(" AffiliationMacro.xls"). Activate 
Sheets("n_Msg_21nteracting systems").Select 

Sheets("n_Msg_2Interacting systems").Copy After—Workbooks(currFilename).Sheets(source) 
ActiveChart. SeriesCollection(l). Select 

'n_X 

ActiveChart.SeriesCollection(l).Values = "=" & source & "!R" & j & "C6:R" & numRows & 

"C6" 


With ActiveChart.SeriesCollection(l).Trendlines(l) 

'put trendline equation onto stats sheet for S(x) 

.DisplayEquation = True 
.DisplayRSquared = True 

Worksheets(source).Cells(l, 22).Value = .DataLabel.Text 
End With 

'n_Y 

ActiveChart.SeriesCollection(2).Values = "=Affiliation_Summary!R" & j & "C7:R" & 
numRows & "C7" 

Application. DisplayAlerts = True 

End Sub 'copy interacting systems graphs 


' Sub: entropySummary 
' Author: Matt Behnke 
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Created: 11/19/01 

Description: creates the entropy summary sheet., for the world and all the bands, 
shows the local and contribution entropies of each band 
inputs: none 
Outputs: none 


Sub entropySummaryO 

numColumnsInTerms = CountCols("Term_Entropy_Dist_A_Band", 1) 
numRowsAband = CountRows("Term_Entropy_Dist_A_Band", 1) 
numRowsBband = CountRows("Term_Entropy_Dist_B_Band", 1) 
numRowsCband = CountRows("Term_Entropy_Dist_C_Band", 1) 
numRowsDband = CountRows("Term_Entropy_Dist_D_Band", 1) 
numRowsWorld = CountRows(worldEntropySheet, 1) 

Sheets. Add After—Worksheets(Worksheets.Count) 

Sheets/Worksheets.Count). Select 
currSheetName = ActiveSheet.Name 

Sheets(currSheefName).Move After:=Sheets("D_Band_Stats") 

Sheets(currSheetName). Select 
ActiveSheet.StandardWidth = 13 

Cells(l, 1) = "" 

Cells(2, 1) = " " 

Cells(3, 1) = "Step" 

Cells(3, 2) = "interval" 

Cells(3, 3) = "ABand Entropy" 

Cells(3, 4) = "A Band Contribution" 

Cells(3, 5) = "A Band Difference" 

Cells(3, 6) = "BBand Entropy" 

Cells(3, 7) = "B Band Contribution" 

Cells(3, 8) = "B Band Difference" 

Cells(3, 9) = "CBand Entropy" 

Cells(3, 10) = "C Band Contribution" 

Cells(3, 11) = "C Band Difference" 

Cells(3, 12) = "DBand Entropy" 

Cells(3, 13) = "D Band Contribution" 

Cells(3, 14) = "D Band Difference" 

Cells(3, 15) = "Sum Band Entropy" 

Cells(3, 16) = "Sum Band Contribution" 

Cells(3, 17) = "World Entropy" 

Cells(3, 18) = "Diff World & Contrib" 
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For i = 4 To numColumnslnTerms 
Cells(i, 1) = i - 3 

Cells(i, 2) = Sheets("Term_Entropy_Dist_A_Band").Cells(l, i) 

Cells(i, 3) = Sheets("Term_Entropy_Dist_A_Band").Cells(numRowsAband + 1, i) 
Cells(i, 4) = Sheets("Term_Entropy_Dist_A_Band").Cells(numRowsAband + 2, i) 
Cells(i, 5) = Sheets("Term_Entropy_Dist_A_Band").Cells(numRowsAband + 3, i) 

Cells(i, 6) = Sheets("Term_Entropy_Dist_B_Band").Cells(numRowsBband + 1, i) 
Cells(i, 7) = Sheets("Term_Entropy_Dist_B_Band").Cells(numRowsBband + 2, i) 
Cells(i, 8) = Sheets("Term_Entropy_Dist_B_Band").Cells(numRowsBband + 3, i) 

Cells(i, 9) = Sheets("Term_Entropy_Dist_C_Band").Cells(numRowsCband + 1, i) 
Cells(i, 10) = Sheets("Term_Entropy_Dist_C_Band").Cells(numRowsCband + 2, i) 
Cells(i, 11) = Sheets("Term_Entropy_Dist_C_Band").Cells(numRowsCband + 3, i) 

Cells(i, 12) = Sheets("Term_Entropy_Dist_D_Band").Cells(numRowsDband + 1, i) 
Cells(i, 13) = Sheets("Term_Entropy_Dist_D_Band").Cells(numRowsDband + 2, i) 
Cells(i, 14) = Sheets("Term_Entropy_Dist_D_Band").Cells(numRowsDband + 3, i) 

Cells(i, 15) = Cells(i, 3) + Cells(i, 6) + Cells(i, 9) + Cells(i, 12) 

Cells(i, 16) = Cells(i, 4) + Cells(i, 7) + Cells(i, 10) + Cells(i, 13) 

Cells(i, 17) = Sheets)worldEntropySheet).Cells(numRowsWorld + 1, i) 

Cells(i, 18) = Abs(Cells(i, 17) - Cells(i, 16)) 

Next i 

ActiveSheet.Name = "Entropy Summary" 

End Sub 


Function: FindStringRowlnSheet 
Author: Matt Behnke 
Created: 2/28/02 

Description: determines the row of the string in the given sheet, uses find function 
inputs: matrixSheet, termName (descriptor), column letter of term in matrixSheet 
Outputs: the row number 


Function findStringlnSheet(ByVal matrixSheet As String, ByVal termName As String, ByVal 
column As String) As String 

With Worksheets(matrixSheet).Range(column & & column) 

Set C = ,Find(termName, LookIn: — xl Values) 

If Not C Is Nothing Then 
firstAddress = C. Address 
temp = Sheets(l).Cells(l, 1) 

Sheets(l).Cells(l, 1) = firstAddress 

theRow = Sheets(l).Cells(l, l).Characters(4, 5).Text 

Sheets(l).Cells(l, 1) =temp 
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findStringlnSheet = theRow 
Else 

findStringlnSheet = 0 
End If 
End With 


End Function 'funciton 


Function: findStringRowlnSheet ****OBSOLETE*** Slow 
Author: Matt Behnke 
Created: 11/16/01 

Description: determines the row of the string in the given sheet 
inputs: sheetname, descriptor, column of desc in datasheet 
Outputs: row number where the value is found 


Function findStringRowlnSheet(ByVal matrixSheet As String, ByVal termName As String, 
ByVal columnNum As Integer) As Integer 

foundAt = 0 

numRows = CountRows(matrixSheet, 1) 

For i = 2 To numRows 'assume column header 
If Cells(i, columnNum).Value = termName Then 
foundAt = i 
found = True 
Exit For 
End If 
Next i 

If found = True Then 

findStringRowlnSheet = foundAt 
Else 

findStringRowlnSheet = 0 
End If 

End Function 


Subroutine: formatSheetForPrint 
Author: Matt Behnke 
Created: 9/19/01 

Description: formats the sheet to fit on one page wide (legal size paper) 

adds header and footer to each sheet and sets orientation to landscape 
inputs: none 
Outputs: none 
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Sub formatSheetForPrint() 

'column heading (R11.3) 

With ActiveSheet.PageSetup 
.PrintTitleRows = 

.PrintTitleColumns = "" 

End With 

ActiveSheet.PageSetup.PrintArea = "$A$1:$Y$203" 

With ActiveSheet.PageSetup 
.LeftHeader = "" 

.CenterHeader = "&A in &F" '(Rl 1.4) 

.RightFleader = "" 

.LeftFooter = "&D" '(R11.5) 

.CenterFooter = "Page &P of &N" 

.RightFooter = "" 

.LefiMargin = Application. lnchesToPoints(0.75) 
.RightMargin = Application.inchesToPoints(0.75) 
.TopMargin = Application.InchesToPoints(l) 

.BottomMargin = Application.lnchesToPoints(l) 
.HeaderMargin = Application.lnchesToPoints(0.5) 
.FooterMargin = Application.lnchesToPoints(0.5) 
.PrintHeadings = False 
.PrintGridlines = True 
.PrintComments = xlPrintNoComments 
.CenterFlorizontally = False 
.CenterVertically = False 
.Orientation = xlLandscape 
.Draft = False 
.PaperSize = xlPaperLetter 
.FirstPageNumber = xl Automatic 
.Order = xlDownThenOver 
.BlackAndWhite = False 
.Zoom = False 
.FitToPagesWide = 1 
.FitToPagesTall = 99 
End With 

End Sub 'format sheet for print 

Sub rSquaredtest() 

Call rSquaredSheet("World") 

Call rSquaredSheet("A_Band") 

Call rSquaredSheet("B_Band") 

Call rSquaredSheet("C_Band") 

Call rSquaredSheet("D_Band") 

End Sub 


'(Rl 1.6) 
'(Rl 1.1) 

'(R11.2) 


' rSquaredSheet 
' author: Matt Behnke 
' created 1/3/02 
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' creates a new sheet that stores the R*R values of the graphs: 

' number of publications over time and cumulative entropy over time 
' Uses the information stored in the affiliation Summary sheets.. 


Sub rSquaredSheet(ByVal band As String) 

Sheets.Add After:=Worksheets(Worksheets.Count) 

Sheets(Worksheets.Count). Select 

currSheet = ActiveSheet.Name 
Sheets(currSheet). Select 

'set the source of the data 
If band = "World" Then 

source = "Affiliation_Summary" 

Else 

source = "Affiliation_Summary_" & band 
End If 

numRows = CountRows(source, 1) 

'fill in the header information of the rsquared sheet 
Call rSquaredSheetHeaderf numRows - 3, currSheet, band) 

'determine startrow 
startRow = 4 

While Not (i > 0#) 'if the first time step's value is zero find the step that isn't 0 
i = Sheets(source).Cells(startRow, 3).Value 
If Not (i > 0) Then 

startRow = startRow + 1 
End If 
Wend 

counter = 6 

For i = startRow To numRows 
'get the month 

j = 1 

While found = False 

testchar = Sheets(source).Cells(i, 2).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 
'get the year 

currentMonth = Sheets(source).Cells(i, 2).Characters(l, j - l).Text 
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If currentMonth <10 Then 

theYear= Sheets(source).Cells(i, 2).Characters(5, 2).Text 
Else 

theYear= Sheets(source).Cells(i, 2).Characters(6, 2).Text 
End If 

'test the year to see if different from before 
If Not the Year = previousYear Then 
startGraphRange = i 
startStep = Sheets(source).Cells(i, 1) 
Sheets(currSheet).Cells(counter, 2) = startStep 
If the Year > 50 Then 

Sheets(currSheet).Cells(counter, 1) = "19" & the Year 
Else 

Sheets(currSheet).Cells(counter, 1) = "20" & the Year 
End If 

Call rSquaredGraph(currSheet, startGraphRange, counter, source) 
counter = counter + 1 
End If 

'update previous year value 
previousY ear = the Y ear 
found = False 
Next i 

Sheets(currSheet).Name = band & " rSquared Power" 

End Sub 'rSquaredSheet 


rSquaredHeader 
Author: Matt Behnke 
Created 1 / 3 / 2002 

creates the header columns and formatting for the rsquared sheet 


Sub rSquaredSheetHeader(ByVal numSteps As Integer, ByVal rSquaredSheetName As String, 
ByVal band As String) 

Sheets(rSquaredSheetName). Select 

Range("Al").Select 

ActiveCell.FormulaRICl = "R-Squared Values for Ada," & band 
Range("A3"). Select 

ActiveCell.FormulaRICl = numSteps & " total steps," & numSteps / 12 & " years." 

Range)" A4"). Select 

ActiveCell.FormulaRICl = "Number of Publications" 

Range("A5").Select 
ActiveCell.FormulaRICl = "Year" 
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Range("B5").Select 

ActiveCell.FormulaRICl = "Starting Step" 

Range("C5").Select 

ActiveCell.FormulaRICl = "Rsquared" 

Range("D5").Select 
ActiveCell.FormulaRICl = "Equation" 

Columns("A:J").Select 
Selection. Column Width = 13.29 

Columns("D:D"). Select 
Selection. ColumnWidth = 18 

Columns("F:F"). Select 
Selection. ColumnWidth = 18 

Range("C5:D5"). Select 
Selection. Copy 

Range("E5").Select 
ActiveSheet.Paste 

Range("E4").Select 

Application.CutCopyMode = False 

ActiveCell.FormulaRICl = "Entropy" 

Range("A5:I5"). Select 

Selection.Borders(xlDiagonalDown).LineStyle = xlNone 
Selection.Borders(xlDiagonalUp).LineStyle = xlNone 
Selection.Borders(xlEdgeLeft).LineStyle = xlNone 
Selection.Borders(xlEdgeTop).LineStyle = xlNone 
With Selection.Borders(xlEdgeBottom) 

.LineStyle = xlContinuous 
.Weight = xlThin 
.Colorlndex = xlAutomatic 
End With 

Selection.Borders(xlEdgeRight). LineStyle = xlNone 
Selection.Borders(xlInsideVertical).LineStyle = xlNone 

Columns("C:C").Select 

With Selection.Borders(xlEdgeLeft) 

.LineStyle = xlContinuous 
.Weight = xlThin 
.Colorlndex = xlAutomatic 
End With 

Columns("E:E"). Select 

With Selection.Borders(xlEdgeLeft) 

.LineStyle = xlContinuous 
.Weight = xlThin 
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.Colorlndex = xlAutomatic 
End With 

Range("Al").Select 
Selection.Font.Bold = Trae 
End Sub 'rsquaredHeader 


Sub: rSquaredGraph 
author: Matt Behnke 
Date: 1/3/2002 

uses a graph to determine the rsqurared value and equation .. 
uses affiliation summary sheets 


Sub rSquaredGraph(ByVal rSquaredSheetName As String, ByVal startGraphRange As Integer, 
ByVal counter As Integer, ByVal source As String) 

'trendType = xlLinear 
trendType = xlPower 

'number of publications 
Charts.Add 

chartName = ActiveChart.Name 
ActiveChart.ChartType = xlLineMarkers 
ActiveChart.SetSourceData source:=Sheets(source).Range/ _ 

"C" & startGraphRange & ":C255"), PlotBy:=xlColumns 

With ActiveChart 
.HasTitle = False 

.Axes(xlCategory, xlPrimary).FlasTitle = False 
.Axes(xlValue, xlPrimary).HasTitle = False 
End With 


ActiveChart. SeriesCollection(l). Select 

ActiveChart.SeriesCollection(l).Trendlines.Add(Type:=trendType, Forward:=0, 
Baekward:=0, DisplayEquation:=True, DisplayRSquared:=True).Select 

'get trendline rsq and equation for num publications 
ActiveChart. SeriesCollection/1). Select 
With ActiveChart.SeriesCollection(l).Trendlines/1) 
trendEq = .DataLabel.Text 
End With 

Sheets/source). Select 
firstPartEq = firstPartTrendEq/trendEq) 
secondPartEq = secondPartTrendEq/ trendEq) 
rSquared = rSquaredTrendEq/trendEq) 
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If trendType = xlPower Then 

Sheets(rSquaredSheetName).Cells(counter, 4) = "y=" & firstPartEq & "x A " & secondPartEq 
Else 

Sheets(rSquaredSheetName).Cells(counter, 4) = "y=" & firstPartEq & "x + " & secondPartEq 
End If 

Sheets(rSquaredSheetName).Cells(counter, 3) = rSquared 
'entropy 

Sheets(chartName). Select 

If Not Sheets(source).Cells(startGraphRange, 6) > 0 And trendType = xlPower Then 
Sheets(rSquaredSheetName).Cells(counter, 5) = "N/A due to zero entropy" 


Else 

ActiveChart.SetSourceData source:=Sheets(source).Range( _ 

"F" & startGraphRange & ":F255"), PlotBy:=xlColumns 

'get trendline rsq and equation for entropy 
ActiveChart. SeriesCollection( 1). Select 
With ActiveChart.SeriesCollection( 1 ).Trendlines( 1) 
trendEq = .DataLabel.Text 
End With 

Sheets(source). Select 
firstPartEq = firstPartTrendEq( trendEq) 
secondPartEq = secondPartTrendEq( trendEq) 
rSquared = rSquaredTrendEq( trendEq) 

If trendType = xlPower Then 

Sheets(rSquaredSheetName).Cells(counter, 6) = "y=" & firstPartEq & "x A " & 

secondPartEq 

Else 

Sheets(rSquaredSheetName).Cells(counter, 6) = "y=" & firstPartEq & "x + " & 

secondPartEq 

End If 

Sheets(rSquaredSheetName).Cells(counter, 5) = rSquared 
End If 


'delete chart 

Sheets(rSquaredSheetName). Select 
Application.DisplayAlerts = False 
Sheets(chartName).Delete 
Application.DisplayAlerts = True 


End Sub 'rSquaredGraph 


' Function: rSquaredTrendEq 
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Author: Matt Behnke 
Created: 1/3/02 

Description: extracts the rSquared value of a trendline equation 

inputs: trendline equation 

Outputs: firstpart of trendline equation 


Function rSquaredTrendEq(ByVal trendlineEq As String) As Double 

tempStorage = Cells(l, 1) 

Cells(l, 1) = trendlineEq 

i = 1 

While found = False 

testchar = Cells(l, l).Characters(i, l).Text 
If testchar = "R" Then 
found = True 
Else 
i = i+ 1 
End If 
Wend 

'i = location of R 

'secondpart starts at character i plus 5.. 

'num of characters = location(x) - 5 
'extract 5 characters.. 

rSquaredTrendEq = Cells(l, l).Characters(i + 5, 6).Text 
Cells(l, 1) = tempStorage 

End Function ' rSquaredTrendEqu 


Function: CountRows 

Author: ? Revised by: Matt Behnke 

Created: ? 

Revised: 9/10/01 

Description: Counts the rows in the suppiled worksheet and column number 
inputs: sheetName - name of the sheet to count the rows in 
colNum - number of the column to count rows in 
Outputs: number of rows as a double 


Function CountRows(ByVal sheetName As String, ByVal colNum As Integer) As Double 

On Error Resume Next 

Dim currCell As Range, rowNum As Double 

Sheets("" & sheetName).Select 

If IsNumeric(colNum) Then 
Else 


363 








colNum = 1 
End If 

rowNum = 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 
Do While currCell.Value <> "" 
rowNum = rowNum + 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 
Loop 

CountRows = rowNum - 1 
End Function 'CountRows 


Function: CountCols 

Author: ? Revised by: Matt Behnke 

Created: ? 

Revised: 9/10/01 

Description: Counts the rows in the suppiled worksheet and column number 
inputs: sheetName - name of the sheet to count the columns in 
rowNum - number of the row to count columns in 
Outputs: number of columns as a double 


Function CountCols(ByVal sheetName As String, ByVal rowNum As Integer) As Integer 

On Error Resume Next 

Dim currCell As Range, colNum As Integer 

Sheets("" & sheetName).Select 

If IsNumeric(rowNum) Then 
Else 

rowNum = 1 
End If 
colNum = 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 

Do While currCell. Value <> "" 
colNum = colNum + 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 

Loop 

CountCols = colNum - 1 
End Function 'CountCols 


Function: firstPartTrendEq 
Author: Matt Behnke 
Created: 11/13/01 

Description: extracts the first part of the given POWER trendline equation, works w/ linear 

inputs: trendline equation 

Outputs: firstpart of trendline equation 


Function firstPartTrendEq(ByVal trendlineEq As String) As Double 
tempStorage = Sheets(dataSheet).Cells(l, 1) 
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Sheets(dataSheet).Cells(l, 1) = trendlineEq 
i = 1 

While found = False 

testchar = Sheets(dataSheet).Cells(l, l).Characters(i, l).Text 
If testchar = "x" Then 
found = True 
Else 
i = i+ 1 
End If 
Wend 

'i = location of x 

'firstpart = starts at character 5 

'num of characters = location(x) - 5 

firstPartTrendEq = Sheets(dataSheet).Cells(l, l).Characters(5, i - 5).Text 
Sheets(dataSheet).Cells(l, 1) = tempStorage 

End Function ' first part trendline 


Function: secondPartTrendEq 
Author: Matt Behnke 
Created: 11/13/01 

Description: extracts the second part of the given POWER trendline equation 

inputs: trendline equation 

Outputs: firstpart of trendline equation 


Function secondPartTrendEq(ByVal trendlineEq As String) As Double 

tempStorage = Sheets(dataSheet).Cells(l, 1) 

Sheets(dataSheet).Cells(l, 1) = trendlineEq 

i = 1 

While found = False 

testchar = Sheets(dataSheet).Cells(l, l).Characters(i, l).Text 
If testchar = "x" Then 
found = True 
Else 
i = i+ 1 
End If 
Wend 

'i = location of x 

'secondpart starts at character i plus 1.. 

'num of characters = location(x) - 5 
'extract 5 characters.. 

secondPartTrendEq = Sheets(dataSheet).Cells(l, l).Characters(i + 1, 5).Text 
Sheets(dataSheet).Cells(l, 1) = tempStorage 
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End Function ' secondPart eq 


Function: secondPartLinearTrendEq 
Author: Matt Behnke 
Created: 2/5/02 

Description: extracts the second part of the given linear trendline equation 

inputs: trendline equation 

Outputs: secondPart of trendline equation 


Function secondPartLinearTrendEq(ByVal trendlineEq As String) As Double 

tempStorage = Sheets(dataSheet).Cells(l, 1) 

Sheets(dataSheet).Cells(l, 1) = trendlineEq 

i = 1 

While found = False 

testchar = Sheets(dataSheet).Cells/1, l).Characters(i, l).Text 
If testchar = "x" Then 
found = True 
Else 
i = i+ 1 
End If 
Wend 

'i = location of x 

'secondpart starts at character i plus 1.. 

'num of characters = location(x) +6 
' 4.143x +2.4441 

* AAAAAAAAA 

'extract 9 characters.. 

secondPartLinearTrendEq = Sheets(dataSheet).Cells(l, l).Characters(i + 1, 9).Text 
Sheets(dataSheet).Cells(l, 1) = tempStorage 

End Function ' secondPart eq 


Function: firstPartPolyTrendEq 
Author: Matt Behnke 
Created: 2/1/02 

Description: extracts the first part of the given trendline equation 
form ax2 + bx + c 
inputs: trendline equation 
Outputs: firstpart of trendline equation 


Function firstPartPolyTrendEq(ByVal trendlineEq As String) As Double 

tempStorage = Sheets(dataSheet).Cells(l, 1) 
Sheets(dataSheet).Cells(l, 1) = trendlineEq 
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i = 1 

While found = False 

testchar = Sheets(dataSheet).Cells(l, 1).Characters!], l).Text 
If testchar = "x" Then 
found = True 
Else 
i - i M 
End If 
Wend 


'i = location of x 

'firstpart = starts at character 5 

'num of characters = location(x) - 5 

firstPartPolyTrendEq = Sheets(dataSheet).Cells(l, l).Characters(5, i - 5).Text 
Sheets(dataSheet).Cells(l, 1) = temp Storage 


End Function ' first part poly order - 2 trendline 


equation 


Function: secondPartPolyTrendEq 
Author: Matt Behnke 
Created: 2/1/02 

Description: extracts the second part of a second order polygonal trendline the given trendline 

inputs: trendline equation 
Outputs: firstpart of trendline equation 


Function secondPartPolyTrendEq(ByVal trendlineEq As String) As Double 


tempStorage = Sheets(dataSheet).Cells(l, 1) 
Sheets(dataSheet).Cells(l, 1) = trendlineEq 


i = 1 

While found = False 

testchar = Sheets(dataSheet).Cells(l, l).Characters(i, l).Text 
If testchar = "x" Then 
found = True 
Else 
i = i+ 1 
End If 
Wend 


j =i+ 1 

While found2 = False 

testchar = Sheets(dataSheet).Cells(l, 1).Characters!], l).Text 
If testchar = "x" Then 
found2 = True 
Else 

j=j + l 
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End If 
Wend 

'i = location of first x 

'j = location of second x 

'secondpart starts at character i plus 5.. 

' il2345 j1234 

'1.0000x2 +2.001x + 8.878 

'nnm of characters = j - i + 5 


secondPartPolyTrendEq = Sheets(dataSheet).Cells(l, l).Characters(i + 5, j - (i + 5)).Text 
Sheets(dataSheet).Cells(l, 1) = tempStorage 

End Function ' secondPart poly eq 


equation 


Function: thirdPartPolyTrendEq 
Author: Matt Behnke 
Created: 2/1/02 

Description: extracts the second part of a second order polygonal trendline the given trendline 

inputs: trendline equation 
Outputs: firstpart of trendline equation 


Function thirdPartPolyTrendEq(ByVal trendlineEq As String) As Double 


tempStorage = Sheets(dataSheet).Cells(l, 1) 
Sheets(dataSheet).Cells(l, 1) = trendlineEq 


i = 1 

While found = False 

testchar = Sheets(dataSheet).Cells(l, l).Characters(i, l).Text 
If testchar = "x" Then 
found = True 
Else 
i = i+ 1 
End If 
Wend 


j =i+ 1 

While found2 = False 

testchar = Sheets(dataSheet).Cells(l, 1).Characters!], l).Text 
If testchar = "x" Then 
found2 = True 
Else 

j=j + l 

End If 
Wend 
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'i = location of first x 

'j = location of second x 

'secondpart starts at character i plus 5.. 

' il2345 j 1234 

'1.0000x2+ 2.001x+8.878 

'third part starts at character j plus 4.. 

'extract 5 characters.. 

thirdPartPolyTrendEq = Sheets(dataSheet).Cells(l, l).Characters(j + 4, 5).Text 
Sheets(dataSheet).Cells(l, 1) = tempStorage 

End Function ' thirdPart poly eq 


Function: cols 
Author: Matt Behnke 
Created: 9/11/01 

Description: changes column number into a letter, 
inputs: columnNumber 
Outputs: column letter 


Function col(ByVal columnNumber As Integer) As String 

Select Case columnNumber 
Case 1 
col = "A" 

Case 2 
col = "B" 

Case 3 
col = "C" 

Case 4 
col = "D" 

Case 5 
col = "E" 

Case 6 
col="F" 

Case 7 
col = "G" 

Case 8 
col = "H" 

Case 9 
col = "I" 

Case 10 
col="J" 

Case 11 
col = "K" 

Case 12 
col = "L" 

Case 13 
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col = "M" 
Case 14 
col = "N" 
Case 15 
col = "O" 
Case 16 
col= "P" 
Case 17 
col = "Q" 
Case 18 
col = "R" 
Case 19 
col="S" 
Case 20 
col = "T" 
Case 21 
col = "U" 
Case 22 
col = "V" 
Case 23 
col = "W" 
Case 24 
col = "X" 
Case 25 
col = "Y" 
Case 26 
col = "Z" 
Case 27 
col = "AA" 
Case 28 
col = "AB" 
Case 29 
col = "AC" 
Case 30 
col = "AD" 
Case 31 
col = "AE" 
Case 32 
col = "AF" 
Case 33 
col = "AG" 
Case 34 
col = "AH" 
Case 35 
col = "AT 
Case 36 
col = "AJ" 
Case 37 
col = "AK" 
Case 38 
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col = "AL" 
Case 39 
col = "AM" 
Case 40 
col = "AN" 
Case 41 
col = "AO" 
Case 42 
col = "AP" 
Case 43 
col = "AQ" 
Case 44 
col = "AR" 
Case 45 
col = "AS" 
Case 46 
col = "AT" 
Case 47 
col = "AU" 
Case 48 
col = "AV" 
Case 49 
col = "AW" 
Case 50 
col = "AX" 
Case 51 
col = "AY" 
Case 52 
col = "AZ" 
Case 53 
col = "BA" 
Case 54 
col = "BB" 
Case 55 
col = "BC" 
Case 56 
col = "BD" 
Case 57 
col = "BE" 
Case 58 
col = "BF" 
Case 59 
col = "BG" 
Case 60 
col = "BH" 
Case 61 
col = "BI" 
Case 62 
col = "BJ" 
Case 63 
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col = "BK" 
Case 64 
col = "BL" 
Case 65 
col = "BM" 
Case 66 
col = "BN" 
Case 67 
col = "BO" 
Case 68 
col = "BP" 
Case 69 
col = "BQ" 
Case 70 
col = "BR" 
Case 71 
col = "BS" 
Case 72 
col = "BT" 
Case 73 
col = "BU" 
Case 74 
col = "BV" 
Case 75 
col = "BW" 
Case 76 
col = "BX" 
Case 77 
col = "BY" 
Case 78 
col = "BZ" 
Case 79 
col = "CA" 
Case 80 
col = "CB" 
Case 81 
col = "CC" 
Case 82 
col = "CD" 
Case 83 
col = "CE" 
Case 84 
col = "CF" 
Case 85 
col = "CG" 
Case 86 
col = "CH" 
Case 87 
col = "Cl" 
Case 88 
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col = "CJ" 
Case 89 
col = "CK" 
Case 90 
col = "CL" 
Case 91 
col = "CM" 
Case 92 
col = "CN" 
Case 93 
col = "CO" 
Case 94 
col = "CP" 
Case 95 
col = "CQ" 
Case 96 
col = "CR" 
Case 97 
col = "CS" 
Case 98 
col = "CT" 
Case 99 
col = "CU" 
Case 100 
col = "CV" 
Case 101 
col = "CW" 
Case 102 
col = "CX" 
Case 103 
col = "CY" 
Case 104 
col = "CZ" 
Case 105 
col = "DA" 
Case 106 
col = "DB" 
Case 107 
col = "DC" 
Case 108 
col = "DD" 
Case 109 
col = "DE" 
Case 110 
col = "DF" 
Case 111 
col = "DG" 
Case 112 
col = "DH" 
Case 113 
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col = "DI" 
Case 114 
col = "DJ" 
Case 115 
col = "DK" 
Case 116 
col = "DL" 
Case 117 
col = "DM" 
Case 118 
col = "DN" 
Case 119 
col = "DO" 
Case 120 
col = "DP" 
Case 121 
col = "DQ" 
Case 122 
col = "DR" 
Case 123 
col = "DS" 
Case 124 
col = "DT" 
Case 125 
col = "DU" 
Case 126 
col = "DV" 
Case 127 
col = "DW" 
Case 128 
col = "DX" 
Case 129 
col = "DY" 
Case 130 
col = "DZ" 
Case 131 
col = "EA" 
Case 132 
col = "EB" 
Case 133 
col = "EC" 
Case 134 
col = "ED" 
Case 135 
col = "EE" 
Case 136 
col = "EF" 
Case 137 
col = "EG" 


Case 138 
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col = "EH" 
Case 139 
col = "El" 
Case 140 
col = "EJ" 
Case 141 
col = "EK" 
Case 142 
col = "EL" 
Case 143 
col = "EM" 
Case 144 
col = "EN" 
Case 145 
col = "EO" 
Case 146 
col = "EP" 
Case 147 
col = "EQ" 
Case 148 
col = "ER" 
Case 149 
col = "ES" 
Case 150 
col = "ET" 
Case 151 
col = "EU" 
Case 152 
col = "EV" 
Case 153 
col = "EW" 
Case 154 
col = "EX" 
Case 155 
col = "EY" 
Case 156 
col = "EZ" 
Case 157 
col = "FA" 
Case 158 
col = "FB" 
Case 159 
col = "FC" 
Case 160 
col = "FD" 
Case 161 
col = "FE" 
Case 162 
col = "FF" 
Case 163 
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col = "FG" 
Case 164 
col = "FH" 
Case 165 
col = "FI” 
Case 166 
col = "FJ" 
Case 167 
col = "FK" 
Case 168 
col = "FL" 
Case 169 
col = "FM" 
Case 170 
col = "FN" 
Case 171 
col = "FO" 
Case 172 
col = "FP" 
Case 173 
col = "FQ" 
Case 174 
col = "FR" 
Case 175 
col = "FS" 
Case 176 
col = "FT" 
Case 177 
col = "FU" 
Case 178 
col = "FV" 
Case 179 
col = "FW" 
Case 180 
col = "FX" 
Case 181 
col = "FY" 
Case 182 
col = "FZ" 
Case 183 
col = "GA" 
Case 184 
col = "GB" 
Case 185 
col = "GC" 
Case 186 
col = "GD" 
Case 187 
col = "GE" 
Case 188 
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col = "GF" 
Case 189 
col = "GG" 
Case 190 
col = "GH" 
Case 191 
col = "GI" 
Case 192 
col = "GJ" 
Case 193 
col = "GK" 
Case 194 
col = "GL" 
Case 195 
col = "GM" 
Case 196 
col = "GN" 
Case 197 
col = "GO" 
Case 198 
col = "GP" 
Case 199 
col = "GQ" 
Case 200 
col = "GR" 
Case 201 
col = "GS" 
Case 202 
col = "GT" 
Case 203 
col = "GU" 
Case 204 
col = "GV" 
Case 205 
col = "GW" 
Case 206 
col = "GX" 
Case 207 
col = "GY" 
Case 208 
col = "GZ" 
Case 209 
col = "HA" 
Case 210 
col = "HB" 
Case 211 
col = "HC" 
Case 212 
col = "HD" 
Case 213 
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col = "HE" 
Case 214 
col = "HF" 
Case 215 
col = "HG" 
Case 216 
col = "HH" 
Case 217 
col = "HI" 
Case 218 
col = "HJ" 
Case 219 
col = "HK" 
Case 220 
col = "HL" 
Case 221 
col = "HM" 
Case 222 
col = "HN" 
Case 223 
col = "HO" 
Case 224 
col = "HP" 
Case 225 
col = "HQ" 
Case 226 
col = "HR" 
Case 227 
col = "HS" 
Case 228 
col = "HT" 
Case 229 
col = "HU" 
Case 230 
col = "HV" 
Case 231 
col = "HW" 
Case 232 
col = "HX" 
Case 233 
col = "HY" 
Case 234 
col = "HZ" 
Case 235 
col = "IA" 
Case 236 
col = "IB" 
Case 237 
col = "IC" 
Case 238 
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col = "ID" 
Case 239 
col = "IE" 
Case 240 
col="IF" 
Case 241 
col = "IG" 
Case 242 
col = "IH" 
Case 243 
col = "II" 
Case 244 
col = "U" 
Case 245 
col = "IK" 
Case 246 
col = "IL" 
Case 247 
col = "IM" 
Case 248 
col = "IN" 
Case 249 
col = "10" 
Case 250 
col = "IP" 
Case 251 
col = "IQ" 
Case 252 
col = "IR" 
Case 253 
col="IS" 
Case 254 
col = "IT" 
Case 255 
col = "IU" 

Case others 
col = "Z" 
End Select 

End Function 'col 


Public Function Update_Mathcad_Band_Stats(ByVal mathcadsheetname As String, _ 
ByVal data sheet name As String, ByVal start cell x As Variant, _ 

ByVal start cell y As Variant, ByVal num rows As Integer, ByVal fit type As Integer, 
ByVal tolerance As Double) As Variant 


' Function: Update_Mathcad_Band_Stats 
' Author: Aaron Micyus 
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Last Modified: 12/05/2001 

Description: Given location information for input data this subroutine 
passes data to embedded mathcad object for processing and returns 
obtained values 
inputs: 

mathcadsheetname : This is the name of the sheet the embedded 
object is in 

data_sheet_name : This is the name of the sheet we will obtain data 
from 

start cell x : This is the cell location we start getting x data from 
startcelly : This is the cell location we start getting y data from 
num_rows : This is the number of data rows we have 
fit_type : integer value corresponding to fit type to return 
0 - Hyperbolic 3 parameter (k,p,r) 

1 - Exponential 3 parameter (k,p,r) 

2 - Power 2 parameter (b,m) 
tolerance : tolerance value for fit 

Outputs: 

array : returned array will hold calculated k,p,r values 

[1] element one : k 

[2] element two : r 

[3] element three: p 

[4] element four : Rsquared value 


'VARS 

Dim Mathcad As Object 'our interface to the Mathcad 

'embedded object 

Dim data x real, data x imag As Variant 'vars for real and imaginary 

'components of x data 

Dim data y real, data y imag As Variant 'vars for real and imaginary 

'components of y data 

Dim tolerance real, tolerance imag As Variant 'vars for real and imaginary 

'components of tolerance 

Dim k_real, k_imag As Variant 'vars for real and imag 

'k values from mathcad 

Dim r real, r imag As Variant 'vars for real and imag 

'r values from mathcad 

Dim p_real, p_imag As Variant 'vars for real and imag 

'p values from mathcad 

Dim b_real, b_imag As Variant 'vars for real and imag 

'b values from mathcad 

Dim mreal, m imag As Variant 'vars for real and imag 

'm values from mathcad 

Dim rsquared real, rsquared imag As Variant 'vars for real and imag 

'r squared values from mathcad 
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Dim current_char_position As Variant 'temp var to hold position in string 

Dim range_x, range_y As Variant 'vars for calculated ranges 

Dim fit_results(4) As Variant 'array to hold returned fit data from mathcad 


'initialize embedded mathcad 

Call Register_Mathcad_OLE(mathcad_sheet_name) 

'activate the sheet with the embedded mathcad object 
Worksheets(mathcadsheetname). Activate 

'get object reference 

Set Mathcad = ActiveSheet.OLEObjects(l).Object 

'activate the sheet with the data 
Worksheets(datasheetname). Activate 

""""'construct the x value range 

'this temp variable holds current position in string we are parsing through 
current_char_position = 1 

'traverse the string until we find a numeric character 

While Not IsNumeric(Mid(start_cell_x, current_char_position, 1)) 

current_char_position = current_char_position + 1 

Wend 

'calculate range string for x 

range_x = start_cell_x & & Left(start_cell_x, 1) 

range x = range x & (Right(start_cell_x, (Len(start cell x) - current_char_position + 1)) + 
num_rows - 1) 


""""""now construct y value range 

'this temp variable holds current position in string we are parsing through 
current_char_position = 1 

'traverse the string until we find a numeric character 

While Not IsNumeric(Mid(start_cell_y, current_char_position, 1)) 

current_char_position = current_char_position + 1 

Wend 

'calculate range string for y 

range_y = start_cell_y & & Left(start_cell_y, 1) 

range_y = range_y & (Right(start_cell_y, (Len(start_cell_y) - current_char_position + 1)) + 
num_rows - 1) 
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'set ranges for input data for mathcad 
dataxreal = ActiveSheet.Range(rangex). Value 
data_x_imag = Empty 

datayreal = ActiveSheet.Range(rangey). Value 
data_y_imag = Empty 


tolerancereal = tolerance 'obtained from parameter 

toleranceimag = Empty 

'import values into mathcad 

Call Mathcad.SetComplex("X_in", data x real, data x imag) 

Call Mathcad.SetComplex("Y_in", data_y_real, data y imag) 

Call Mathcad.SetComplex("eTOL", tolerance real, tolerance imag) 

'have mathcad recalculate sheet 
Call Mathcad.Recalculate 

If fit type = HYP3 FIT Then 

'get values from mathcad for excel 

Call Mathcad.GetComplex("outO", kreal, k imag) 

Call Mathcad.GetComplex("outl", rreal, r imag) 

Call Mathcad.GetComplex("out2", preal, p imag) 

Call Mathcad.GetComplex("out3", rsquaredreal, rsquared imag) 

'fill array with results 
fit_results(l) = k_real 
fit_results(2) = r_real 
fit_results(3) = preal 
fit_results(4) = rsquaredreal 

Elself fit type = EXP3 FIT Then 

'get values from mathcad for excel 

Call Mathcad.GetComplex("out4", k real, k imag) 

Call Mathcad.GetComplex("out5", r real, r imag) 

Call Mathcad.GetComplex("out6", p real, p imag) 

Call Mathcad.GetComplex("out7", rsquared real, rsquared imag) 

'fill array with results 
fitresults(l) = kreal 
fit_results(2) = r_real 
fit_results(3) = preal 
fit_results(4) = rsquaredreal 

Elself fit type = POW2 FIT Then 

'get values from mathcad for excel 
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Call Mathcad.GetComplex("out8", breal, b imag) 

Call Mathcad.GetComplex("out9", mreal, m imag) 

Call Mathcad.GetComplex("outlO", rsquaredreal, rsquared imag) 

'fill array with results 
fitresults(l) = breal 
fit_results(2) = mreal 
fit_results(3) = Empty 
fit_results(4) = rsquaredreal 

End If 

UpdateMathcadBandStats = fitresults 

'end of Update Mathcad Band Stats 
End Function 

Public Function Register_Mathcad_OLE(ByVal mathcad sheet name As String) 


' register mathcad ole Macro 

' opens embedded mathcad document in order for system to recognize it for future macro 


Sheets(mathcadsheetname). Select 
Range(" A1"). Activate 
ActiveSheet.Shapes("Object 1"). Select 
Selection.Verb Verb:=xlPrimary 
Range(" A1"). Activate 
'Sheets/" ABandStats").Select 
End Function 


sub: fillMonthsCol 
Author: Matt Behnke 
Created: 12/11/01 

Description: fills in the months if they are missing.. Inserts a column used for a matrix sheet 
this doesnt work because there are not enough columns 
inputs: sheetName 
Outputs: 


Sub fillMonthsCol/)' ByVal sheetName As String) 

sheetName = ActiveSheet.Name 
numColumns = CountCols(sheetName, 1) 

Dim theMonth As Integer 

counter = 1 

monthCounter = "" & counter 
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For i = 4 To numColumns 


j = 1 

While found = False 

testchar = Cells(l, i).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(l, i).Characters(l, j - l).Text 
If currentMonth <10 Then 

restofDate = Cells(l, i).Characters(2, 5).Text 
Else 

restofDate = Cells(l, i).Characters(3, 5).Text 
End If 

theMonth = currentMonth 
If theMonth > monthCounter Then 
While theMonth > monthCounter 
Range(Cells(l, i), Cells(l, i)).Select 
Selection.EntireColumn.Insert 

'copy previous column 

Columns(col(i - 1) & & col(i - 1)).Select 

Selection.Copy 

Columns(col(i) & & col(i)).Select 

ActiveSheet.Paste 

Cellsfl, i) = & monthCounter & restofDate 

counter = counter + 1 
If counter =13 Then 
counter = 1 
End If 

monthCounter = "" & counter 
i = i + 1 

numColumns = numColumns + 1 
Wend 
End If 


counter = counter + 1 
If counter =13 Then 
counter = 1 
End If 

monthCounter = "" & counter 
found = False 
Next i 
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Columns("D:D").Column'Width = 6.29 
End Sub ' fill months col 


sub: qLevelSummary 
Author: Matt Behnke 
Created: 1/14/02 

Description: creates a summary sheet for a q_level, lists time steps and num of instances 
for each q level 

inputs: qlevelType: author or term 

prefix: the sheet prefix, changes whether its author or term 
numqLevels: the number of q levels 
Outputs: 


Sub qLevelSummary(ByVal qLevelType As String, ByVal prefix As String, ByVal numqLevels 
As Integer) 

For z = 1 To numqLevels 
If z < 10 Then 

sheetName = prefix & "0" & z & & qLevelType & "_month" 

summarySheet = prefix & "0" & z & & qLevelType & " summary" 

Else 

sheetName = prefix & "" & z & & qLevelType & "_month" 

summarySheet = prefix & "" & z & & qLevelType & " summary" 

End If 

Sheets.Add After:=Sheets(Sheets.Count) 

Sheets/Sheets.Count). Select 
ActiveSheet.Name = summarySheet 

numColumns = CountCols(sheetName, 1) 
numRows = CountRows(sheetName, 1) 

Sheets(summarySheet).Cells(l, 1) = "q level" 

Sheets(summarySheet).Cells(l, 2) = z 


Sheets(summarySheet).Cells(2, 1) = " " 

Sheets(summarySheet).Cells(3, 1) = "Time Steps" 

Sheets(summarySheet).Cells(3, 2) = "sum" 

Sheets(summarySheet).Cells(3, 3) = "count" 

For i = 4 To numColumns 

Sheets(summarySheet).Cells(i, 1) = Sheets(sheetName).Cells(l, i) 

Sheets(summarySheet).Cells(i, 2) = "=SUM("' & sheetName & & col(i) & "2:" & col(i) 

& numRows & ")" 

Sheets(summarySheet).Cells(i, 3) = "=Count("' & sheetName & & col(i) & "2:" & col(i) 

& numRows & ")" 

Next i 
Next z 
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End Sub 


sub: qLevelTrigger 
Author: Matt Behnke 
Created: 1/18/02 

Description: activates the qlevel functions 
inputs: 

Outputs: 


Sub qLevelTrigger() 

qLevelType = InputBox("Enter Author or Term:") 
numqLevels = lnputBox(" Enter number of qLevels:") 

If qLevelType = "Author" Or qLevelType = "Term" And numqLevels > 0 Then 'check input 

If qLevelType = "Author" Then 
prefix = 

Else 

prefix = "" 

End If 

'Call qLevelCumulative(qLevelType, prefix, numqLevels) 

' Call qLevelSummary(qLevelType, prefix, numqLevels) 

'Call qLevelYears(qLevelType, prefix, numqLevels) 

' Call qLevelMonths(qLevelType, prefix, numqLevels, False) 

'Call qLevelMonths(qLevelType, prefix, numqLevels, True) 'calculate mass 
' Call qLevelMonthsCount)qLevelType, prefix, numqLevels) 

Call qLevelEntropy(qLevelType, prefix, numqLevels) 

'Call qLevelEntropy2(numqLevels) 'wrong!!!!!!!!!!!!!! 

'Call qLevelMonthTemp(numqLevels) 

Else 

MsgBox ("WRONG INPUT.. TRY AGAIN!") 

End If 

End Sub 


sub: qLevelCumulative 
Author: Matt Behnke 
Created: 1/16/02 

Description: ..puts in cumulative values for each time step, and sums at the bottom 
inputs: qlevelType: author or term 

prefix: the sheet prefix, changes whether its author or term 
numqLevels: the number of q levels 
Outputs: 
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Sub qLevelCumulative(ByVal qLevelType As String, ByVal prefix As String, ByVal numqLevels 
As Integer) 

For i = 1 To numqLevels 
If i < 10 Then 

Call CalcCumulative(prefix & "0" & i & & qLevelType & " month") 

Else 

Call CalcCumulative(prefix & "" & i & & qLevelType & " month") 

End If 
Next i 

End Sub 


sub: qLevelYears 
Author: Matt Behnke 
Created: 1/16/02 

Description: uses an array of years to store the amount of instances for a year. 

outputs each year of the dataset to each summary sheet and number of instances 
inputs: qlevelType: author or term 

prefix: the sheet prefix, changes whether its author or term 
numqLevels: the number of q levels 
Outputs: 


Sub qLevelYears(ByVal qLevelType As String, ByVal prefix As String, ByVal numqLevels As 

Integer) 


'number of ntuples 
ntuples = numqLevels 

Dim years As Variant 

overallQSummary = "q_summary_year" 

firstYear = "2500" 
firstYearOffset = 6 
lastYear = 0 


Sheets. Add After:=Sheets( Sheets. Count) 
Sheets/Sheets.Count). Select 
ActiveSheet.Name = overallQSummary 

For z = 1 To ntuples 


0 , 0 , 0 ) 


years = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 


If z < 10 Then 

summarySheet = prefix & "0" & z & & qLevelType & " summary" 


387 






Else 

summary Sheet = prefix & "" & z & "_" & qLevelType & "summary" 

End If 

numRows = CountRows(summarySheet, 1) 

For i = 4 To numRows 
'determine current year 

j = 1 

While found = False 

testchar = Cells(i, l).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(i, l).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear = Cells(i, l).Characters(5, 2).Text 
Else 

theYear = Cells(i, l).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear = "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

'check to see if the currentYear is less than first year 
If the Year < firstYear Then 
first Year = theYear 

firstYearOffset = firstYearOffset - 1 'array index 
End If 

If theYear > lastYear Then 
lastYear = theYear 
End If 

yearOffset = theYear - first Year + 5 'so there can be 5 years less data than the first first Year 

value 

years(yearOffset) = Sheets(summarySheet).Cells(i, 2) 
found = False 
Next i 

'output yearsArray 
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Sheets(summarySheet).Cells(3, 4) = "Years" 
Sheets(summarySheet).Cells(3, 5) = "instances" 

counter = 4 'for output 


For x = firstYearOffset To lastYear - firstYear + firstYearOffset 

Sheets(summarySheet).Cells(counter, 4) = firstYear + x - firstYearOffset 
Sheets(summarySheet).Cells(counter, 5) = years(x) 'instances 
If z = 1 Then 

Sheets(overallQSummary).Cells(counter, z) = firstYear + x - firstYearOffset 
Sheets(overallQSummary).Cells(counter, z + 1) = 0 
End If 

Sheets(overallQSummary).Cells(counter, z + 2) = years(x) 


(cumulative) 


-l.z + 2) 


'if there is a zero in a year then the put the previous years value into the current year 

If Sheets(overallQSummary).Cells(counter, z + 2) = 0 And x > firstYearOffset Then 

Sheets(overallQSummary).Cells(counter, z + 2) = Sheets(overallQSummary).Cells(counter 

End If 


If x = firstYearOffset Then 

Sheets(overallQSummary).Cells(counter - 1, z + 2) = z 
End If 


counter = counter + 1 

Next x 

Next z 
'copy chart 

currFilename = Application. ActiveWorkbook.Name 
'FIX 

THIS*** s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i ss i =s i =s i =s i =s i ss i =s i =s i =s i =s i ss i =s i =s i =s i =s i =s i =s i =s i =s i =s i ss i =s i =s i =s i =s i ss i = ^ =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i ss i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i =s i ss i =s i =s i =s i =s i = 

Windows(" AffiliationMacro3.xls"). Activate 
Sheets("q_level_yr").Select 

Sheets("q_level_yr").Copy After—Workbooks(currFilename).Sheets(l) 

counter = 4 
columnStart = 2 
For z = 2 To ntuples 

ActiveChart.SeriesCollection(z - 1).Values = "=" & overallQSummary & "!R" & counter & 
"C" & columnStart & ":R" & counter & "C" & ntuples 
counter = counter + 1 
Next z 
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End Sub 


k. 


sub: qLevelMonths 

Author: Matt Behnke 

Created: 1/27/02 -finished 2/4/02 

Description: uses an array of years and months to store the amount of instances for each timestep 

outputs the number of cumulative instances per month / year 
inputs: qlevelType: author or term 

prefix: the sheet prefix, changes whether its author or term 
numqLevels: the number of q levels 
Outputs: 


Sub qLevelMonths(ByVal qLevelType As String, ByVal prefix As String, ByVal numqLevels As 
Integer, ByVal mass As Boolean) 


'number of ntuples 
ntuples = numqLevels 


Dim years As Variant 
Dim yearsMonths As Variant 


firstYear = "2500" 
lastYear = 0 


If mass = True Then 

overallQSummary = "q_summary_monthly_count_mass" 
Else 

overallQSummary = "q_summary_monthly_count" 

End If 


Sheets. Add After:=Sheets( Sheets. Count) 
Sheets/Sheets.Count). Select 
ActiveSheet.Name = overallQSummary 

Sheets(overallQSummary).Cells(l, 1) = " " 
Sheets(overallQSummary).Cells(2, 1) = " " 
Sheets(overallQSummary).Cells(3, 1) = " " 

Cells(4, 2).Select 

ActiveWindow.FreezePanes = True 


years = Array/) 
yearsMonths = Array/) 

'ReDim yearsMonths(0 To 0, 1 To 12) 

For z = 1 To ntuples 
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'get the source sheet's name 
If z < 10 Then 

summarySheet = prefix & "0" & z & & qLevelType & " summary" 

Else 

summarySheet = prefix & "" & z & & qLevelType & " summary" 

End If 

numRows = CountRows(summarySheet, 1) 

'scan the first sheet to get the last and first year to get the array bounds 
If z = 1 Then 

For i = 4 To numRows 

'determine current year 

j = 1 

While found = False 

testchar = Cells(i, l).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(i, l).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear = Cells(i, l).Characters(5, 2).Text 
Else 

theYear = Cells(i, l).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear = "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

'check to see if the current Year is less than first year 
If the Year < first Year Then 
firstYear = theYear 
End If 

If theYear > lastYear Then 
lastYear = theYear 
End If 

found = False 

Next i 'done scanning the sheet now redim the array 
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'ReDim yearsMonths( first Year To lastYear, 1 To 12) 

End If'z= 1 

ReDim yearsMonths) first Year To lastYear, 1 To 12) 

For i = 4 To numRows 'now process all the nTuple sheets 
'determine current year 

j = 1 

While found = False 

testchar = Cells(i, l).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(i, l).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear = Cells(i, l).Characters(5, 2).Text 
Else 

theYear = Cells(i, l).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear = "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

yearsMonths(theYear, currentMonth) = Sheets(summarySheet).Cells(i, 2) 
found = False 
Next i 

'output yearsArray 

'Sheets(summarySheet).Cells(3, 4) = "Years" 
'Sheets(summarySheet).Cells(3, 5) = "instances" 

counter = 4 'for output 
monthCounter = 1 
lastValue = 0 


For j = firstYear To lastYear 
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For k = 1 To 12 


If"" & j = firstYear And k = 1 Then 'label the q levels 
Sheets(overallQSummary).Cells(counter - 1, z + 2) = z 
End If 

If z = 1 Then 'put the month/year 

Sheets(overallQSummary).Cells(counter, z) = '"" & monthCounter & "/" & j 
Sheets(overallQSummary).Cells(counter, z + 1) = 0 
End If 

currentValue = yearsMonths(j, monthCounter) 

If mass = True Then 
multiplyer = z 
Else 

multiplyer = 1 
End If 

If currentValue > lastValue Then 

Sheets(overallQSummary).Cells(counter, z + 2) = currentValue * multiplyer 
lastValue = currentValue 
Else 

Sheets(overallQSummary).Cells(counter, z + 2) = lastValue * multiplyer 
End If 

monthC ounter = monthCounter + 1 
If monthCounter >12 Then 
monthCounter = 1 
End If 

counter = counter + 1 

Next k 
Next j 

If z = ntuples Then 

Sheets(overallQSummary).Cells(counter - 1, z + 3) = "=SUM(" * col(2) & counter - 1 
& & col(nTuples + 2) & counter - 1 & ")" 

End If 


Next z 

'total of all the qlevels: 

Sheets(overallQSummary).Cells(l, 1) = "=SUM(" & col(2) & counter - 1 & & col(ntuples + 2) 

& counter - 1 & ")" 


'copy chart 

' currFilename = Application.ActiveWorkbook.Name 
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' Windows("AffiliationMacro.xls").Activate 
' Sheets("q_level_yr").Select 

' Sheets("q_level_yr").Copy After—Workbooks(currFilename). Sheets/Sheets.Count) 

' counter = 4 
' columnStart = 2 
' For z = 2 To nTuples 

' ActiveChart.SeriesCollection(z - 1).Values = "=" & overallQSummary & "!R" & counter & 
"C" & columnStart & ":R" & counter & "C" & nTuples 
' counter = counter + 1 
' Next z 

End Sub 'qlevel months 


k. 


sub: qLevelMonthsCOUNT 
Author: Matt Behnke 
Created: 2/17/02 

Description: uses an array of years and months to store the amount of instances for each timestep 

outputs the number of terms in the vocab per month / year puts 
inputs: qlevelType: author or term 

prefix: the sheet prefix, changes whether its author or term 
numqLevels: the number of q levels 

Outputs: 


Sub qLevelMonthsCount(ByVal qLevelType As String, ByVal prefix As String, ByVal 
numqLevels As Integer) 


'number of ntuples 
ntuples = numqLevels 


Dim years As Variant 
Dim yearsMonths As Variant 


firstYear = "2500" 
lastYear = 0 


overallQSummary = "q_summary_monthly_count_count" 

Sheets. Add After:=Sheets(Sheets.Count) 
Sheets(Sheets.Count). Select 
ActiveSheet.Name = overallQSummary 

Sheets(overallQSummary).Cells(l, 1) = " " 
Sheets(overallQSummary).Cells(2, 1) = " " 
Sheets(overallQSummary).Cells(3, 1) = " " 

Cells(4, 2).Select 
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ActiveWindow.FreezePanes = True 


years = Array() 
yearsMonths = Array() 

'ReDim yearsMonths(0 To 0, 1 To 12) 

For z = 1 To ntuples 

'get the source sheet's name 
If z < 10 Then 

summarySheet = prefix & "0" & z & & qLevelType & "_summary" 

Else 

summarySheet = prefix & "" & z & & qLevelType & " summary" 

End If 

numRows = CountRows(summarySheet, 1) 

'scan the first sheet to get the last and first year to get the array bounds 
If z = 1 Then 

For i = 4 To numRows 

'determine current year 

j = 1 

While found = False 

testchar = Cells(i, l).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(i, l).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear = Cells(i, l).Characters(5, 2).Text 
Else 

theYear = Cells(i, l).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear = "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

'check to see if the current Year is less than first year 
If theYear < first Year Then 
firstYear = theYear 
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End If 


If the Year > lastYear Then 
lastYear = the Year 
End If 

found = False 

Next i 'done scanning the sheet now redim the array 
'ReDim yearsMonths( first Year To lastYear, 1 To 12) 

End If'z= 1 

ReDim yearsMonths) first Year To lastYear, 1 To 12) 

For i = 4 To numRows 'now process all the nTuple sheets 
'determine current year 

j = 1 

While found = False 

testchar = Cells(i, l).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(i, l).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear = Cells(i, l).Characters(5, 2).Text 
Else 

theYear = Cells(i, l).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear = "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

yearsMonths(theYear, currentMonth) = Sheets(summarySheet).Cells(i, 3) 'count 
found = False 
Next i 

counter = 4 'for output 
monthCounter = 1 
lastValue = 0 
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'numYears = UBound(yearsMonthsa) 

*H 4 ■¥ H 4 ■¥ -i* H 4 ■¥ H 4 ■¥ -i* -k ■¥ »k -k -k -k -k -k -k -k »k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k -k TTHIS l^tCf ^ ^ 

For j = firstYear To lastYear 
For k = 1 To 12 

If"" & j = firstYear And k = 1 Then 'label the q levels 
Sheets(overallQSummary).Cells(counter - 1, z + 2) = z 
End If 

If z = 1 Then 'put the month/year 

Sheets(overallQSummary).Cells(counter, z) = '"" & monthCounter & "/" & j 
Sheets(overallQSummary).Cells(counter, z + 1) = 0 
End If 

current Value = yearsMonths(j, monthCounter) 

If mass = True Then 
multiplyer = z 
Else 

multiplyer = 1 
End If 

If currentValue > lastValue Then 

Sheets(overallQSummary).Cells(counter, z + 2) = currentValue * multiplyer 
lastValue = currentValue 
Else 

Sheets(overallQSummary).Cells(counter, z + 2) = lastValue * multiplyer 
End If 

monthC ounter = monthCounter + 1 
If monthCounter >12 Then 
monthCounter = 1 
End If 

counter = counter + 1 

Next k 
Next j 

' If z = nTuples Then 

Sheets(overallQSummary).Cells(counter -1, z + 3) = "=SUM(" * col(2) & counter - 1 _ 

' & & col(nTuples + 2) & counter - 1 & ")" 

' End If 


Next z 

'total of all the qlevels: 

Sheets(overallQSummary).Cells(l, 1) = "=SUM(" & col(2) & counter - 1 & & col(ntuples + 2) 

& counter - 1 & ")" 

End Sub 'qlevel months CoUNT 
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k. 


sub: qLevelEntropy 
Author: Matt Behnke 
Created: 2/4/02 
finished: 3/11/02 

Description: uses an array of years and months to store the amount of entropy for each timestep 

outputs the cumulative entropy per timestep per q level 
inputs: qlevelType: author or term 

prefix: the sheet prefix, changes whether its author or term 
numqLevels: the number of q levels 
Outputs: 


Sub qLevelEntropy(ByVal qLevelType As String, ByVal prefix As String, ByVal numqLevels As 

Integer) 


'number of ntuples 
ntuples = numqLevels 

Dim years As Variant 

Dim yearsMonths As Variant 

Dim contributionEntropy As Variant 

firstY ear = "2500" 
lastYear = 0 

overallQEntropy = "q_local_entropy_monthly" 
contributionEntropySheet = "q_contribution_entropy_monthly" 
countSheet = "q_summary_monthly_count" 

'add the contribution entropy sheet 
Sheets. Add After:=Sheets( Sheets. Count) 

Sheets/Sheets.Count). Select 
ActiveSheet.Name = contributionEntropySheet 
Cells(4, 2).Select 

ActiveWindow.FreezePanes = True 

'add the local entropy sheet 

Sheets. Add After:=Sheets( Sheets. Count) 

Sheets/Sheets.Count). Select 
ActiveSheet.Name = overallQEntropy 
Cells(4, 2).Select 

ActiveWindow.FreezePanes = True 

years = Array/) 
yearsMonths = Array/) 
contributionEntropy = Array/) 

'ReDim yearsMonths(0 To 0, 1 To 12) 
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For z = 1 To ntuples 

'get the source sheet's name 
If z < 10 Then 

summarySheet = prefix & "0" & z & & qLevelType & "_month" 

Else 

summarySheet = prefix & "" & z & & qLevelType & "_month" 

End If 

numRows = CountRows(summarySheet, 1) 
numCols = CountCols(summarySheet, 1) 

'find the row that contains the time counts for the current time step... 

'scan the first sheet to get the last and first year to get the array bounds 
If z = 1 Then 

For i = 4 To numCols 

'determine current year 

j = 1 

While found = False 

testchar = Cells(l, i).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(l, i).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear = Cells(l, i).Characters(5, 2).Text 
Else 

theYear = Cells(l, i).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear = "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

'check to see if the current Year is less than first year 
If theYear < first Year Then 
firstYear = theYear 
End If 


399 



If the Year > lastYear Then 
lastYear = the Year 
End If 

found = False 

Next i 'done scanning the sheet now redim the array 
'ReDim yearsMonths( first Year To lastYear, 1 To 12) 

End If'z= 1 

ReDim yearsMonths(firstYear To lastYear, 1 To 12) 

ReDim contributionEntropy(firstYear To lastYear, 1 To 12) 

For i = 4 To numCols 'now process all the nTuple sheets 

'determine current year 

j = 1 

While found = False 

testchar = Sheets(summarySheet).Cells(l, i).Characters(j, l).Text 
If testchar = "/" Then 
found = True 
Else 

j =j + 1 

'found = True 
End If 
Wend 

'month ends at j 

currentMonth = Sheets(summarySheet).Cells(l, i).Characters(l, j - l).Text 
If currentMonth <10 Then 

theYear= Sheets(summarySheet).Cells(l, i).Characters(5, 2).Text 
Else 

theYear= Sheets(summarySheet).Cells(l, i).Characters(6, 2).Text 
End If 

If theYear > 50 Then 'add prefix to the year 
theYear= "19" & theYear 
Else 

theYear = "20" & theYear 
End If 

time Step = "" & currentMonth & "/" & theYear 
timeStepRow= findStringlnSheet(countSheet, timeStep, "A") 

' temp = Sheets(summarySheet).Cells(l, 1) 

' Sheets(summarySheet).Cells(l, 1) = timeStepRange 
' timeStepRow = Sheets(summarySheet).Cells(l, l).Characters(4, 5).Text 
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' Sheets(summarySheet).Cells(l, l) = temp 

totallnstances = Sheets(countSheet).Cells(l, 1) 'timeStepRow, ntuples + 3) '0_Q 
localSumlnstances = Sheets(countSheet).Cells(timeStepRow, z + 2) '0_q_k 


Sh(A)_k_q = num instances A at k in q_level num instances A at k in q_lvl 

- * log2 - 

sum of instances at k in q_lvl sum of instances at k in q_lvl 


Sh(k)_q = Sh(A)_k + Sh(B)_k + ... 

[ sum instances at k in q_lvl (0_q_k) \ 

contribution Cs_qlevel_k = abs [- 

[ sum instances at all Q lvls (0_Q) ] 
0_q_k 0_Q 

+ -* log2- 

0_Q 0_q_k 


* Sh(k)_q 


'traverse all the rows in the summarySheet to get the num of instances of each term 
'and the entropies of each term 

'store the sum of the entropies of each term in step k in the yearsMonths array. 

For j = 2 To numRows 

If Sheets(summarySheet).Cells(j, i) > 0 Then 
theValue = Sheets(summarySheet).Cells(j, i) 

entropy = (-theValue / localSumlnstances) * (Log(theValue / localSumlnstances) / 

Log(2)) 


yearsMonths(theYear, currentMonth) = yearsMonths(theYear, currentMonth) + entropy 
End If 


totallnstances) 


'at the last term in the time step compute the contribution entropy 
If j = numRows Then 

contributionEntropy( the Year, currentMonth) = Abs(localSumlnstances / totallnstances) 

* yearsMonths(theYear, currentMonth) + ((localSumlnstances / 

* (Log(totallnstances / localSumlnstances) / Log(2))) 

End If 


Next j 

found = False 
Next i 


'output yearsArray 

'Sheets(summarySheet).Cells(3, 4) = "Years' 
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'Sheets(summarySheet).Cells(3, 5) = "instances" 

counter = 4 'for output 
monthCounter = 1 
lastValue = 0 
lastContributionValue = 0 

For j = firstYear To lastYear 

For k = 1 To 12 

If"" & j = firstYear And k = 1 Then 'label the q level 
Sheets(overallQEntropy).Cells(counter - 1, z + 2) = z 
End If 

If z = 1 Then 'put the month/year 

Sheets(overallQEntropy).Cells(counter, z) = & monthCounter & "/" & j 

Sheets(overallQEntropy).Cells(counter, z + 1) = 0 

Sheets("q_contribution_entropy_monthly").Cells(counter, z) = & monthCounter & 

"/"&j 

Sheets("q_contribution_entropy_monthly").Cells(counter, z + 1) = 0 
End If 

current Value = yearsMonths(j, monthCounter) 
currentContributionValue = contributionEntropy(J, monthCounter) 

If currentValue > lastValue Then 

Sheets(overallQEntropy).Cells(counter, z + 2) = currentValue 

lastValue = currentValue 
Else 

Sheets(overallQEntropy).Cells(counter, z + 2) = lastValue 
End If 

If currentContributionValue > lastContributionValue Then 

Sheets(contributionEntropySheet).Cells(counter, z + 2) = currentContributionValue 
lastContributionValue = currentContributionValue 
Else 

Sheets(contributionEntropySheet).Cells(counter, z + 2) = lastContributionValue 
End If 

monthCounter = monthCounter + 1 
If monthCounter >12 Then 
monthCounter = 1 
End If 

counter = counter + 1 

Next k 
Next j 
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Next z 

'total of all the qlevels: 

'Sheets(overallQEntropy).Cells(l, 1) = "=SUM(" & col(2) & counter & & col(nTuples + 2) & 

counter 


End Sub ' qlevelEntropy 


sub: qLevelEntropy2 — OBSOLETE?? 

Author: Matt Behnke 
Created: 2/4/02 

Description: uses an array of years and months to store the amount of entropy for each timestep 

outputs the cumulative entropy per timestep per q level 
inputs: numQlevels 
Outputs: 


Sub qLevelEntropy2(ByVal numqLevels As Integer) 


'the source datasheet contains the timesteps and the count of instances in each q level per time 

step.. 

datasheet = "q_summary_monthly_count" 
tempSheet = "q_level_monthly_Entropy" 

numRows = CountRows(dataSheet, 1) 
numValues = ntuples 

Sheets.Add After:=Sheets(Sheets.Count) 

Sheets(Sheets.Count). Select 
ActiveSheet.Name = tempSheet 

Cells(4, 2).Select 

ActiveWindow.FreezePanes = True 

'copy the sheet., datasheet —> tempsheet 
Sheets(dataSheet). Select 
Cells.Select 
Selection.Copy 
Sheets(tempSheet). Select 
Range("Al").Select 
ActiveSheet.Paste 


totallnstances = Sheets(dataSheet).Cells(l, 1) 'THe total num of instances over the whole dataset 
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For k = 4 To numRows 'traverse all the steps 

For z = 1 To numqLevels 'traverse all the nTuples 
If Sheets(dataSheet).Cells(k, z + 2) > 0 Then 
theValue = Sheets(dataSheet).Cells(k, z + 2) 

entropy = (-theValue / totallnstances) * (Log(theValue / totallnstances) / Log(2)) 
Sheets(tempSheet).Cells(k, z + 2) = entropy 
End If 
Next z 

Next k 


End Sub ' qlevelEntropy2 


sub: qLevelMonthTemp 
Author: Matt Behnke 
Created: 2/28/02 

Description: uses the counts from the q level month sheet to: 

1) store the instances of each time step into an array 

2) calculates the probabilities of each instance 

3) determines the x & y values 

4) determines the alpha, beta values from the curvefit 

5) outputs timesteps, and alpha and beta onto a new sheet 
coeff(l) = A 

coeff(2) = B 
alpha = B 

beta = e A (-A/alpha) 

inputs: numqLevels - the number of qlevels nTuples.. 

Outputs: 


Sub qLevelMonthTemp(ByVal numqLevels As Integer) 

'number of ntuples 
ntuples = numqLevels 


Dim instances_q(64) As Variant 
Dim probabilities(64) As Variant 
Dim x_values(64) As Variant 
Dim y_values(64) As Variant 
Dim coefficients As Variant 
Dim numRows As Integer 
Dim numlnstancesk As Double 
Dim gamma As Double 
Dim numValues As Integer 


'stores the instances in each q_level q_i_k 
'stores the probabilities of instances P(q_i_k) 
'x_values [X: ln(qi+r)] -weibull 
'y_values [Y: ln[-ln( 1 -P(qi)] -weibull 
'coefficients 

'number of rows on the datasheet 
'total num of instances at step k 
'the shift, r 

'the number of values in the x,y arrays 


gamma = 0# 
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'the source datasheet contains the timesteps and the count of instances in each q level per time 

step.. 

datasheet = "q_summary_monthly_count" 
tempSheet = "q_level_monthly_temp" 

numRows = CountRows(dataSheet, 1) 
numValues = ntuples 

Sheets. Add After:=Sheets(Sheets.Count) 

Sheets(Sheets.Count). Select 
ActiveSheet.Name = tempSheet 

Cells(4, 2).Select 

ActiveWindow.FreezePanes = True 

Sheets(tempSheet).Cells(l, 1) = " " 

Sheets(tempSheet).Cells(2, 1) = " " 

Sheets(tempSheet).Cells(3, 1) = "k" 

Sheets(tempSheet).Cells(3, 2) = "interval" 

Sheets(tempSheet).Cells(3, 3) = "A" 

Sheets(tempSheet).Cells(3, 4) = "B" 

Sheets(tempSheet).Cells(3, 5) = "alpha = B" 

Sheets(tempSheet).Cells(3, 6) = "beta = e A (-A/alpha)" 

Sheets(tempSheet).Cells(2, 6) = "T = beta" 

'numlnstancesk = Sheets(dataSheet).Cells(l, 1) 'THe total num of instances over the whole 

dataset 


For k = 4 To numRows 'traverse all the steps 

numlnstances k = Sheets(dataSheet).Cells(k, ntuples + 3) 'columns are offset by 2 
Sheets(tempSheet).Cells(k, 1) = k - 3 'timestep 
Sheets(tempSheet).Cells(k, 2) = Sheets(dataSheet).Cells(k, 1) 'interval 

num_q_at_k = 0 

For z = 1 To ntuples 'traverse all the nTuples 

instances q(z) = Sheets(dataSheet).Cells(k, z + 2) 
probabilities(z) = 0 'reinit array 
If instances q(z) > 0 Then 

probabilities(z) = instances_q(z) / numlnstances_k 
'In = log(x) / log(exp(l)) 

x_values(z) = Log(instances_q(z) + gamma) / Log(Exp(l)) 'Ln(instances_q(z) + gamma) 
y_values(z) = Log((-Log(l - probabilities(z)) / Log(Exp(l)))) / Log(Exp(l)) 'Ln(-Ln(l - 
probabilities(z))) 

num_q_at_k = num_q_at_k + 1 

'DEBUG********8 

Sheets(tempSheet).Cells(3, 10) = "actual probabilities" 
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Sheets(tempSheet).Cells(k, z + 9) = probabilities(z) 


Else 

End If 
Next z 

If num_q_at_k > 0 Then 

coefficients = curveFit(num_q_at_k, xvalues, yvalues, 1) 


Sheets(tempSheet).Cells(k, 3) = coefficients(l) 'A 
Sheets(tempSheet).Cells(k, 4) = coefficients^) 'B 
Sheets(tempSheet).Cells(k, 5) = coefficients^) 'alpha = B 
If Not coefficients^) = 0 Then 

Sheets(tempSheet).Cells(k, 6) = Exp(-coefficients(l) / coefficients(2)) 'beta 
Else 

Sheets(tempSheet).Cells(k, 6) = 0 'beta 
End If 

End If'q at k> 0 
Next k 

'debug::::::: 

For k = 4 To numRows 
For z = 1 To ntuples 

instances q(z) = Sheets(dataSheet).Cells(k, z + 2) 

If instances q(z) > 0 Then 
'DEBUG************** 

t 

' p(q_i) = l-e A (q_i/beta) A alpha 

t 

If Sheets(tempSheet).Cells(k, 6) > 0 Then 
Dim p_q_i_calced As Double 
Dim beta As Double 
Dim alpha As Double 
beta = Sheets(tempSheet).Cells(k, 6).Value 
alpha = Sheets(tempSheet).Cells(k, 5).Value 
'Sheets(tempSheet).Cells(l, 1) = instances_q(z) 

'=(1-EXP(B3/$M$7) A $L$7)*-1 

t 

'p_q_i_calced = 1 - (1 / (Exp(instances_q(z) / beta) A alpha)) 

Sheets(tempSheet).Cells(3, 40) = "calculated probabilities" 

Sheets(tempSheet).Cells(k, z + 39) = "=(1-EXP(-" & instances_q(z) & "/" & beta & 

") A " & alpha & ")" 

End If 

End If 
Next z 
Next k 
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End Sub 


sub: fillMonthsRow 
Author: Matt Behnke 
Created: 12/11/01 

Description: fills in the months if they are missing.. Inserts a row 
works on lists.... must run 2-3 times to ensure all filled., 
inputs: sheetName 
Outputs: 


Sub fillMonthsRow(ByVal sheetName As String, ByVal startRow As Integer) ' ByVal sheetName 
As String) 


'sheetName = ActiveSheet.Name 

Sheets(sheetName). Select 
numColumns = CountCols(sheetName, 1) 
numRows = CountRows(sheetName, 1) 
Dim theMonth As Integer 

'startRow = 4 

counter = 1 

monthCounter = "" & counter 
For i = startRow To numRows 


step!! 


j = 1 

While found = False 

testchar = Cells(i, 2).Characters!], l).Text 
If testchar = "/" Then 
found = True 
Else 

j=j + l 

End If 
Wend 

'month ends at j 

currentMonth = Cells(i, 2).Characters/1, j - l).Text 
If currentMonth <10 Then 

restofDate = Cells(i, 2).Characters(2, 5).Text 
Else 

restofDate = Cells(i, 2).Characters(3, 5).Text 
End If 

theMonth = currentMonth 
If theMonth > monthCounter Then 
While theMonth > monthCounter 
If i = startRow Then 

'Range(Cells(i, 1), Cells(i, 1)).Select DONT ADD TIMESTEPS before Starting 
'Selection.EntireRow.Insert 
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'Cells(i, 2) = & monthCounter & restofDate 

'Cells(i, 3) = 0 
'Cells(i, 6) = 0 
Else 

Range(Cells(i, 1), Cells(i, 1)).Select 
Selection.EntireRow.Insert 

'copy previous row 

Rows(i -1 & & i -1).Select 

Selection.Copy 
Rows(i & & i).Select 

ActiveSheet.Paste 

Cells(i, 2) = & monthCounter & restofDate 

End If 

Cells(i, 1) = i - startRow + 1 
counter = counter + 1 
If counter =13 Then 
counter = 1 
End If 

monthCounter = "" & counter 
i = i+ 1 

numRows = numRows + 1 
Wend 
End If 

Cells(i, 1) = i - startRow + 1 

counter = counter + 1 
If counter =13 Then 
counter = 1 
End If 

monthCounter = "" & counter 
found = False 
Next i 

End Sub ' fill months rows 

Sub fillMonthsRowTrigger() 

Call fillMonthsRow("A_Band_Stats", 14) 

Call fillMonthsRow("B_Band_Stats", 14) 

Call fillMonthsRow("C_Band_Stats", 14) 

Call fillMonthsRow("D_Band_Stats", 14) 

Call fillMonthsRow("A_Band_Stats", 14) 

Call fillMonthsRow("B_Band_Stats", 14) 

Call fillMonthsRow("C_Band_Stats", 14) 

Call fillMonthsRow("D_Band_Stats", 14) 

Call fillMonthsRow("World_Stats", 14) 
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Call fillMonthsRow("World_Stats", 14) 

Call fillMonthsRow("Affiliation_Summary_A_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_B_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_C_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_D_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_A_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_B_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_C_Band", 4) 
Call fillMonthsRow("Affiliation_Summary_D_Band", 4) 

Call fillMonthsRow("Affiliation_Summary", 4) 

Call fillMonthsRow("Affiliation_Summary", 4) 

Call fillMonthsRow("Entropy Summary", 4) 

Call fillMonthsRow("Entropy Summary", 4) 

End Sub 


sub assignAN() 

Author: Matt Behnke 
Created: 10/19/01 

Description: Assigns month/year to each 1NSPEC Accession number 

places the month/year on the title sheet, used to determine monthly values 


Sub assignAN() 

nRowsAN = CountRows("AN", 1) 
nRowsTitle = CountRows("Title", 1) 

Sheets("Title"). Select 
Sheets("Title").Cells(l, 3) = "PubDate" 

Sheets("Title").Cells(l, 4) = "PubYear" 

For x = 2 To nRowsTitle 

compare = Sheets("Title").Cells(x, 1) 

For i = 2 To nRowsAN 

'find the accession number range that the AN from the title sheet 
'falls into 

timelnt = Sheets("AN").Cells(i, 1) 
startAN = Sheets("AN").Cells(i, 2) 
endAN = Sheets("AN").Cells(i, 3) 

If compare >= startAN And compare <= endAN Then 

Sheets("Title").Cells(x, 3) = timelnt 'when found put the month/year 
Sheets("Title").Cells(x, 4) = Year(timelnt) 

End If 
Next i 
Next x 

End Sub 
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sub putFirstAuthor() 

Author: Matt Behnke 
Created: 12/19/01 

Description: Fills in holes in the list of affiliations. 

goes thru the list of authors., if the accession number of the author is not found in affiliations, 
place the first author name into the list of affiliations 

can be used after to check to see if there are records without aff or author 
After ran on author accession numbers run it on title accession numbers 


Sub putFirstAuthor/) 

authors = "Authors (Cleaned)" 

'authors = "Title" 

affiliation = "Affiliation (Cleaned)" 

nRowsAff = CountRows/affiliation, 1) 
nRowsAuth = Co un t R o ws( authors, 1) 
n = 1 'counter 
found = False 
lastAN = 0 

For i = 2 To nRowsAuth 

authorNum = Sheets(authors).Cells(i, 1) 
authorName = Sheets(authors).Cells(i, 2) 

For j = 2 To nRowsAff 

affiliationNum = Sheets(affiliation).Cells(j, 1) 

If affiliationNum = authorNum Or lastAN = authorNum Then 
found = True 
End If 
Next j 

If found = False Then 

Sheets(affiliation).Cells(nRowsAff + n, 1) = authorNum 
Sheets(affiliation).Cells(nRowsAff + n, 2) = authorName 
n = n+ 1 

lastAN = authorNum 
End If 

found = False 
Next i 

End Sub 


' sub v_calc_v_psi_sheet() 
' Author: Matt Behnke 
' Created: 2/22/02 
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Description: creates a sheet either by band or world that has the result of v and psi. 

where v = (num records at an instance in step k)/(total num of records at step k) 


(num of authors at an instance in step k)/(total num of authors at step k) 
psi (tasks per timestep on avg) = v / timestep 
inputs: band - the source.. 


Sub v_calc_v_psi_sheet(ByVal band As String) 

'Dim authorMatrixTotals As Variant 
'Dim affiliationMatrixTotals As Variant 
Dim sum v array As Variant 

'N_i_k = the number of records produced by affiliation i at timestep k 
'N_Total_k = the number of records produced by all affiliations at timestep k 
'P_i_k = the number of authors who published in affiliation i at timestep k 
'P_Total_k = the number of authors who published in all affiliations at timestep k 

authorMatrixTotals = ArrayQ 
affiliationMatrixTotals = Array() 

Sheets. Add After—Worksheets(Worksheets.Count) 

Sheets(Worksheets. Count). Select 

ActiveSheet.Name = "v_calculation_" &band 
currentSheetName = ActiveSheet.Name 

'move the sheet so it is by related sheets 
' Sheets(currentSheefName).Move Before:=Sheets("" & band & "_Stats") 

If band = "World" Then 

affiliationMatrix = datasheet 
authorMatrix = "Affiliationauthors" 

Else 

affiliationMatrix = "Affiliation Cum Dist " & band 
authorMatrix = "Aff_Author_Cum_Dist_" & band 
End If 

numRowslnAffiliationMatrix = CountRows(affiliationMatrix, 1) 
numColsInAffiliationMatrix = CountCols(affiliationMatrix, 1) 

numRowslnAuthorMatrix = CountRows(authorMatrix, 1) 
numColsInAuthorMatrix = CountCols(authorMatrix, 1) 

'ReDim affiliationMatrixTotals(4 To numColsInAffiliationMatrix) 

'ReDim authorMatrixTotals(4 To numColsInAuthorMatrix) 
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ReDim sum_v_array(4 To numColslnAuthorMatrix) 

'headers 

Sheets(currentSheetName).Cells(l, 1) = " " 

Sheets(currentSheetName).Cells(l, 2) = " " 

Sheets(currentSheetName).Cells(3, 1) = "Time Step" 

Sheets(currentSheetName).Cells(3, 2) = "interval" 

Sheets(currentSheetName).Cells(3, 3) = "v" 

Sheets(currentSheetName).Cells(3, 4) = "psi" 

For i = 2 To numRowsInAuthorMatrix 
For j = 4 To numColsInAffiliationMatrix 
If i = 2 Then 
time Step = j - 3 

interval = Sheets(authorMatrix).Cells(l, j) 

Sheets(currentSheetName).Cells(j, 1) = timeStep 'the timestep 
Sheets(currentSheetName).Cells(j, 2) = interval 'the timestep 
End If 

affiliationNameFromAuthorMatrix = Sheets(authorMatrix).Cells(i, 3) 

'find the row that contains the affiliation name from authors matrix in the affiliation matrix 
'sheet 

N_i_k_row = findStringlnSheet(affiliationMatrix, affiliationNameFromAuthorMatrix, "C") 

'temp = Sheets(dataSheet).Cells(l, 1) 

'Sheets(dataSheet).Cells(l, 1) = N_i_k_range 

'N_i_k_row = Sheets(dataSheet).Cells(l, l).Characters(4, 5).Text 

'Sheets(dataSheet).Cells(l, 1) = temp 

Ifj > 4 Then 'find the values at that instance... not cumulative 

P_i_k = Sheets(authorMatrix).Cells(i, j) - Sheets(authorMatrix).Cells(i, j - 1) 

N_i_k = Sheets(affiliationMatrix).Cells(N_i_k_row, j) 

Sheets(affiliationMatrix).Cells(N_i_k_row, j - 1) 

N total = Sheets(affiliationMatrix).Cells(numRowslnAffiliationMatrix + 4, j) _ 

- Sheets(affiliationMatrix).Cells(numRowslnAffiliationMatrix + 4, j - 1) 

Ptotal = Sheets(authorMatrix).Cells(numRowslnAuthorMatrix + 4, j) _ 

- Sheets(authorMatrix).Cells(numRowslnAuthorMatrix + 4, j - 1) 

Else 

P_i_k = Sheets(authorMatrix).Cells(i, j) 

N_i_k = Sheets(affiliationMatrix).Cells(N_i_k_row, j) 

Ntotal = Sheets(affiliationMatrix).Cells(numRowslnAffiliationMatrix + 4, j) 

P_total = Sheets(authorMatrix).Cells(numRowslnAuthorMatrix + 4, j) 

End If 
'calculate v 

' where v = (num records at an instance in step k)/(total num of records at step k) 


(num of authors at an instance in step k)/(total num of authors at step k) 
If P_i_k And N total > 0 Then 

v = (((N_i_k / N_total) / P_i_k) / P_total) 
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Else 
v = 0 

End If 

sum_v_array(j) = sum_v_array(j) + v 
'debug*****8 

Sheets(currentSheetName).Cells(i + 2, j + 4) = sum_v_array(j) 
Sheets(currentSheetName).Cells(l, 2) = j 
'debug 
Next j 

Sheets(currentSheetName).Cells(l, 1) = i 
Next i 

For j = 4 To numColsInAffiliationMatrix 
'output array of sum v.. make cumulative 
time Step = j - 3 


Ifj > 4 Then 'cumulative 

Sheets(currentSheetName).Cells(j, 3) = sum_v_array(j) + 

Sheets(currentSheetName).Cells(j - 1,3) 

Else 

Sheets(currentSheetName).Cells(j, 3) = sum_v_array(j) 

End If 

psi = Sheets(currentSheetName).Cells(j, 3) / timeStep 

Sheets(currentSheetName).Cells(j, 4) = psi 
Next j 

End Sub 'calc_v_psi_sheet 


sub clearArray() 

Author: Matt Behnke 
Created: 2/22/02 

Description: clears the values stored in an array. 

inputs: lowerBound - lowerbound of the array 
upperBound - upperbound of the array 
arrayName - the array 


Sub clearArray(ByVal lowerBound As Integer, ByVal upperBound As Integer, ByVal arrayName 
As Variant) 


For i = lowerBound To upperBound 
arrayName(i) = "" 

Next i 

End Sub 
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function LinearlnterpolationQ 
Author: Matt Behnke 
Created: 2/26/02 

Description: linear interpolation used to calculate missing data: 

[ X_i - Xlow] 

Y_i = y_low + [-] * (Y_hi - Y_low) 

[X_hi - X_low] 

inputs: Y_low - the closest "real" value to the left of the missing value 
Y_high - the closest "real" value to the right of the missing value 
X_low - the closest time step that has data to the left of the missing value 
X_high - the closest time step that has data to the right of the missing value 
X_i - the time step that has the missing data., 
output: returns a value, Y_i, for the missing time step. 


Function LinearInterpolation(ByVal Ylow As Double, ByVal Y high As Double, ByVal X low 
As Integer, _ 

ByVal Xhigh As Integer, ByVal X_i As Integer) As Double 
Linearlnterpolation = Y low + ((X_i - X low) / (X high - X low)) * (Y high - Y low) 

End Function 


sub: FilllnMissingData() 

Author: Matt Behnke 
Created: 2/26/02 

Description: Stores a list of data in an array, traverses the array to find 

points where the data doesn't change. In our case where nothing was added 
due to lack of information (small holes in the dataset). 

When an element that doesn't change is found a linearlnterpolation is performed 
to determine what the value should be. 
the value is changed and marked in red. 

inputs: datasheet - the source of the data. 

columnNumber - the column that contains the data 
startRow - the row number where the data starts 
endRow - the row number where the data ends 


Sub FilllnMissingData(ByVal datasheet As String, ByVal columnNumber As Integer, ByVal 
startRow As Integer, _ 

ByVal endRow As Integer) 

Dim dataArray As Variant 
data Array = Array() 

numTimeSteps = endRow - startRow + 1 
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ReDim dataArray(l To numTimeSteps) 

' numTimesteps (k) = 5 
' arrayindex (a) = 0 to 5 
' startrow = 3 
' endrow = 7 
' k a row 
' 1 1 3+0=3 
'2 2 3+1=4 
' 3 3 3+2=5 
'4 4 3+3=6 
'5 5 3+4=7 

'populate the array with data 
For i = 1 To numTimeSteps 

dataArray(i) = Sheets(dataSheet).Cells(startRow + i - 1, columnNumber) 

Next i 

'analyze the array: 

For i = 1 To numTimeSteps 
If Not i = numTimeSteps Then 
currentValue = dataArray(i) 
nextValue = dataArray(i + 1) 

If currentValue = nextValue Then 

lowestDifferent Value = dataArray(i) 'Ylow 

lowestDifferentValuePosition = i 'Xlow 

For j = i + 2 To numTimeSteps 'scan the array to find the next higher value 
nextHigherValue = dataArray(j) 'Y_high 

If Not nextHigherValue = currentValue Then 
nextHigherValuePosition = j 'X_high 

Exit For 'j 
End If 
Next j 

For k = lowestDifferentValuePosition + 1 To nextHigherValuePosition - 1 

'now the next lower and higher values are known along with their positions,call 

linearlnterpolation 

'k = y_i 

dataArray(k) = Linearlnterpolation(lowestDifferentValue, nextHigherValue, _ 
lowestDifferentValuePosition, nextHigherValuePosition, k) 

Next k 

End If'currentvalue = nextValue 
End If'not equal to numTimesteps 
Next i 

'output the array 

For i = 1 To numTimeSteps 

Sheets(dataSheet).Cells(startRow + i - 1, columnNumber) = dataArray(i) 

Next i 
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End Sub 'fill in missing data 

'TEST FILL IN MISSING DATA 
Sub testFilllnMissing() 

'tests the linear interpolation function..can also be used as an interface to the function... 

datasheet = InputBox("enter the name of the source sheet") 
columnNumber = lnputBox("enter column number") 
rowStart = InputBox(" enter the first row of data") 
rowEnd = InputBox(" enter the last row of data") 

Call FilllnMissingData(dataSheet, columnNumber, rowStart, rowEnd) 


End Sub 


CurveFit 

Author: Erchuang (Al) Wang (original), Matt Behnke - converted to VB 

Converted: 2/27/02 

Description: 

This program will fit a curve up to 10th degree polynomial 
in the form of Y = aO + al*x + a2*x A 2 + ... + a(n)*x A (n) 
where n is the degree of the polynomial and l>=n=<30 
Reads in two lists of numbers, X & Y-values and performs the fit 

inputs: datasheet - sheet with the source values 

numValues - the number of values in the array 
x - array of the x values 
y - array of the y-values 
degree - the degree of the polynomial 
Outputs: CoefficientArray - the coefficients of the equation aO, al,... a(n) 


Function curveFit(ByVal numValues As Integer, _ 

ByVal x As Variant, ByVal y As Variant, ByVal degree As Integer) As Variant 

'numValues = endRow - startRow + 1 

'variables 

Dim coefficientArray(64) As Variant 'stores the results, max of 64 coeffi. 

'Dim x As Variant 'a one dimension array for x values 

'Dim y As Variant 'a one dimension array for y values 

Dim cn(64) As Variant '??????????????????????????? 

Dim ar(64, 64) As Variant 'a two dimension array 

Dim an(64, 64) As Variant 'answer array 


Dim sum As Double 
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Dim t As Double 
Dim d As Double 
Dim b As Double 


Dim j As Integer 

'for loop counter 

Dim i As Integer 

'for loop counter 

Dim m As Integer 


Dim n As Integer 

-numValues, number of data points 

Dim ii As Integer 

'for loop counter 

Dim k As Integer 

'for loop counter 

Dim nn As Integer 


Dim nd As Integer 

-degree, degree of poly 

n = numValues 


nd = degree 


m = nd + 1 


nn = m + 1 



'generate normal equation A and vector B of Ax=B 
For ii = 1 To n 
For j = 1 To m 

If j = 1 And x(ii) = 0# Then 
ar(ii, j) = 1# 

Else 

ar(ii, j) = x(ii) A (j - 1) 

End If 
Next j 
Next ii 

For k = 1 To m 
For ii = 1 To m 
sum = 0# 

For j = 1 To n 

sum = sum + ar(j, k) * ar(j, ii) 

Next j 

an(k, ii) = sum 
Next ii 
Next k 

For ii = 1 To m 
sum = 0# 

For j = 1 To n 

sum = sum + y(j) * ar(j, ii) 

Next j 

cn(ii) = sum 
Next ii 

'solve x vector of Ax=B where A=A' 

Fori= 1 To m 
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an(i, nn) = cn(i) 

Next i 

Fori= 1 To m 
k = i 

b = Abs(an(i, i)) 

If b = 0# Then 
Forj = i To m 

If b < Abs(an(j, ii)) Then 
b = Abs(an(j, i)) 

k=j 

End If 
Next j 

For j = 1 To nn 
t = an(i, j) 
an(i, j) = an(k, j) 
an(k, j) = t 
Next j 

Else 

d = an(i, i) 

End If 

For j = 1 To nn 
an (i, j) = an(i, j) / d 
Next j 

For j = 1 To m 
b = an(j, i) 


For k = 1 To nn 
If Not j = i Then 

an(j, k) = an(j, k) - an(i, k) * b 
End If 
Next k 
Next j 
Next i 

'put answers into the coefficient array 
For ii = 1 To m 

coefficientArray(ii) = an(ii, nn) 

Next ii 

curveFit = coefficient Array 
End Function 'curvefit 
Sub testFindQ 
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'WORKS'... 

"Phys. Dept., Kakatiya Univ., Warangal, India 
With ActiveSheet.Range("A:A") 

Set C = .Find("3/1979", LookIn:=xlValues) 

If Not C Is Nothing Then 
fnstAddress = C. Address 
MsgBox (firstAddress) 

End If 
End With 

'MsgBox (Search("Phys. Dept., Kakatiya Univ., Warangal, India")) 
End Sub 
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Macro: Cumulative 
Author: Matt Behnke 
Created: 9/10/01 

Description: Computes the cumulative entopy for each term at each time 
interval. 

Creates summary sheets and graphs based on the computation. 


Sub Cumulative() 

'declare constants: 
datasheet = " Sheet 1" 
copyTo = "Sheet2" 
summarySheet = "Sheet3" 
sliceStart = 4 'first column of timeslices 
termStart = 4 'first row of that contains a term 

'fill empty cells in on the datasheet so count rows function works properly 
Sheets("" & datasheet).Select 
Cells(2, 1) = " " 

Cells/1, 1) = "" 

Cells/1,2) = "" 

Call copyTerms/dataSheet, copyTo) 

Call addSums/dataSheet, termStart, sliceStart, copyTo) 

Call fillSheets/dataSheet, copyTo, sliceStart, termStart) 

Call removeFormulas/copyTo, datasheet, termStart, sliceStart) 

Call createGraph/copyTo, datasheet, termStart, sliceStart, "Entropy Power Trend", xlPower) 
Call createSummary/dataSheet, copyTo, termStart, sliceStart) 

Call entropyLambda/summarySheet, copyTo, termStart, sliceStart) 

Sheets/"" & dataSheet).Name = "Data" '(R12.1) 

Sheets/"" & copyTo).Name = "Cumlative_Entropy" '(R12.2) 

Sheets/"" & summarySheet).Name = "Summary" '(R5.12) 

End Sub 


Subroutine: copyTerms 
Author: Matt Behnke 
Created: 9/10/01 

Description: 1) Copies the terms from the data sheet to another sheet where 

the entropy formula will be applied to the data. (R2.1) 

inputs: datasheet - name of the datasheet 

copyTo - name of the sheet with the copied terms 
Outputs: none 


Sub copyTerms/ByVal datasheet As String, ByVal copyTo As String) 
termEnd = CountRows/dataSheet, 1) 
sliceEnd = CountCols/dataSheet, 3) 

Worksheets("Sheetl").Range("Al:" & col/sliceEnd) & termEnd).Copy 
Destination-Worksheets/" Sheet2").Range/" Al") 
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Sheets("" & copyTo). Standard Width = 9 
End Sub 'copyTerms 


'(R12.3) 


Subroutine: addSums 
Author: Matt Behnke 
Created: 9/10/01 

Description: 1) Puts the sum of all the term's instances for each time 
interval one row below the data in that interval 
2) Puts the cumulative sum of term instances from previous 
time intervals one row below the sum of term instances, 
inputs: datasheet - name of the datasheet 

copyTo - name of the sheet with the copied terms 
sliceStart - start of time slice columns 
termStart - start of the terms (rows) 

Outputs: none 


Sub addSums(ByVal datasheet As String, ByVal sliceStart As Integer, ByVal termStart As 
Integer, ByVal copyTo As String) 

sliceEnd = CountCols(dataSheet, 3) 
x = CountRows(dataSheet, 1) 'last row of the terms 

SheetsC" & dataSheet).Cells(x +1,3) = "Sum" 

Sheets("" & datasheet).Cells(x +1,3) = "Sum to date" 

For i = sliceStart To sliceEnd 

'for each column in time slice range put the formula that calcs local sum 
Sheets("" & dataSheet).Cells(x + 1, i).Formula = "=SUM(" & col(i) & termStart & & col(i) 

&x&")" 

'place formula on copied sheet also., where entropy sums will be (R2.5) 

SheetsC" & copyTo).Cells(x + 1, i).Formula = "=SUM(" & col(i) & termStart & & col(i) & 

x & ")" 

'cumulative number of instances per slice: 

If i = sliceStart Then 

SheetsC" & dataSheet).Cells(x + 2, i).Formula = "=" & col(i) & x + 1 
Else 

SheetsC" & dataSheet).Cells(x + 2, i).Formula = "=" & col(i) & x + 1 & "+" & col(i - 1) & x 

+ 2 

End If 
Next i 

'format the datasheet for print (R12.4) 

SheetsC" & datasheet).Select 
Call formatSheetForPrint 

End Sub 'addSums 


Subroutine: fillSheets 
Author: Matt Behnke 
Created: 9/11/01 

Description: 1) Places the formula used to calculate the cumulative entropy 
in each row of terms in the first time slice. 
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(R2.2) 

(R2.3) 








2) Calls copy formula to copy the formula from the first time slice 
to all of them (R2.4) 

inputs: datasheet - name of the datasheet 

copyTo - name of the sheet with the copied terms 
sliceStart - start of time slice columns 
termStart - start of the terms (rows) 

Outputs: none 


Sub fillSheets(ByVal datasheet As String, ByVal copyTo As String, ByVal sliceStart As Integer, 
ByVal termStart As Integer) 

termEnd = CountRows(dataSheet, 1) 
sliceEnd = CountCols(dataSheet, 3) 

i = sliceStart 

For x = termStart To termEnd 

Sheets)"" & copyTo).Cells(x, i).Formula = "=If(SUM(" & datasheet & "!$" & col(sliceStart) & 

x & _ 

& datasheet & "!" & col(i) & x & ")=0,0, -SUM(" & datasheet & "!$" & col(sliceStart) & x 

&_ 

& datasheet & "!" & col(i) & x & ")/" & datasheet & "!" & col(i) & termEnd + 2 & _ 

& "LOG(SUM(" & datasheet & "!$" & col(sliceStart) & x & _ 

& datasheet & "!" & col(i) & x & ")/" & datasheet & "!" & col(i) & termEnd + 2 & ",2))" 
Next x 

'format the entropy data sheet (R12.4) 

Sheets("" & copyTo).Select 
Call formatSheetForPrint 

With Worksheets)"" & copyTo).Columns("C") '(R12.7) 

.ColumnWidth = 43 
End W ith 

With Worksheets)"" & dataSheet).Columns("C") 

.ColumnWidth = 43 
End W ith 

'call copy formulas subroutine to finish calculation. 

Call copyFormulas(copyTo, datasheet, termStart, sliceStart) '(R2.4) 

End Sub 'fillSheets 


Subroutine: copyFormulas 
Author: Matt Behnke 
Created: 9/11/01 

Description: 1) copies the formulas from the first time interval to the rest of the 
intervals 

inputs: copyTo - name of the sheet with the copied terms 
sliceStart - start of time slice columns 
termStart - start of the terms (rows) 

Outputs: none 


Sub copyFormulas(ByVal copyTo As String, ByVal termStart As Integer, ByVal sliceStart As 

Integer) 
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termEnd = CountRows(copyTo, 1) 
sliceEnd = CountCols(copyTo, 3) 

'select the column of the first time slice where the entropy formula has been applied. 
Sheets("" & copyTo).Select 

Range("" & col(sliceStart) & termStart & & col(sliceStart) & termEnd).Select 

Selection. Copy 

'copy the formula from the first time slice's column to every other time slices' column 
For i = sliceStart + 1 To sliceEnd 
Range("" & col(i) & termStart).Select 
ActiveSheet.Paste 
Next i 

End Sub 


Subroutine: removeFormulas 
Author: Matt Behnke 
Created: 9/14/01 

Description: 1) removes the formulas from the copiedTo sheet (where cumulative entropy 
is) .. This gives faster worksheet loading time because the cells 
don't need to be calulated everytime the worksheet is loaded, 
inputs: copyTo - name of the sheet with the copied terms 
sliceStart - start of time slice columns 
termStart - start of the terms (rows) 

Outputs: none 


Sub removeFormulas(ByVal copyTo As String, ByVal termStart As Integer, ByVal sliceStart As 

Integer) 


termEnd = CountRows(copyTo, 1) 
sliceEnd = CountCols(copyTo, 3) 

'copy the sheet and paste special (values only) 

Sheets("" & copyTo).Select 

Range("" & col(sliceStart) & termStart & ":" & col(sliceEnd) & termEnd).Select 
Selection.Copy 

Range("" & col(sliceStart) & termStart).Select 

Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _ 
False, Transpose:=False 
End Sub 


Subroutine: createGraph 
Author: Matt Behnke 
Created: 9/12/01 
Revised: 9/14, 9/17 

Description: 1) Creates a chart and names it according the the name in the input. 

2) creates a trend-line on the source data, added 9/17 

3) formats titles, data series markers, trend-line, chart area (9/10, 14, 17) 
inputs: sourceSheet - name of the sheet where cumulative entropy has been calculated 

datasheet - original data sheet (co-occurance matrix from Tech OASIS) 

termStart - start of the terms (rows) 

sliceStart - start of time slice columns 

chartName - name of the chart 

trendType - type of trendline to add 
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' Outputs: none 


Sub createGraph(ByVal sourceSheet As String, ByVal datasheet As String, ByVal termStart As 
Integer, ByVal sliceStart As Integer, ByVal chartName As String, ByVal trendType As String) 

termEnd = CountRows(dataSheet, 1) 
sliceEnd = CountCols(dataSheet, 3) 

projectYears = (sliceEnd - 4) / 2 'calculate number of units to project trend-line (R3.5) 

Charts.Add 

ActiveChart.ChartT ype = xlX YScatterSmooth '(R3.1) 

ActiveChart.SetSourceData Source—Sheets("" & sourceSheet).Range("" & col(sliceStart) & 
termEnd + 1 & _ 

& col(sliceEnd) & termEnd +1), PlotBy:=xlRows '(R3.2) 

ActiveChart.Location Where:=xlLocationAsNewSheet 
With ActiveChart 
.HasLegend = Trae 
.HasTitle = True 


.ChartTitle.Characters.Text = "Cumulative Entropy vs. Year" '(R4.1) 

.Axes(xlCategory, xlPrimary).HasTitle = True 

.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "k (Years)" '(R4.3) 

.Axes(xlValue, xlPrimary).HasTitle = True 

.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Entropy Sk (Bits)" '(R4.2) 

End W ith 

'increase chart area and move legend (R4.7, 


R4.8) 

ActiveChart. PlotArea. Select 
Selection. Width = 598 
Selection.Height = 395 
ActiveChart.Legend. Select 
Selection.Left = 326 
Selection.Top = 207 

'change line style and marker points style (R4.5, 

R4.6) 

ActiveChart.SeriesCollection(l).Name = "Cumulative Entropy" 

ActiveChart. SeriesCollection(l). Select 
With Selection.Border 
.Weight = xlThin 
.LineStyle = xlNone 
End With 
With Selection 

.MarkerBackgroundColorlndex = 44 
.MarkerForegroundColorlndex = 45 
.MarkerStyle = xlTriangle 
.Smooth = True 
.MarkerSize = 6 
.Shadow = Trae 
End With 

currentName = ActiveChart.Name 

'add trendline (R3.3) 

ActiveWorkbook.Charts("" & currentName).SeriesCollection(l).Trendlines.Add 
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Sheets("" & datasheet).Select 

With Charts("" & currentName).SeriesCollection(l).Trendlines(l) 

.Type = trendType 
.Forward = projectYears 
.DisplayEquation = True 
If trendType = xlPower Then 
Worksheets("sheet3").Cells(l, 4).Value = .DataLabel.Text 
End If 

.DisplayRSquared = True 
End W ith 

'move trendline '(R4.10) 

Acti veChart. SeriesC ollection( 1 ).Trendlines( 1 ).DataLabel. Select 
Selection.Left = 494 
Selection.Top = 198 

'increase legend size 
ActiveChart.Legend. Select 
Selection. Width = 201 

'remove border and color fill on plot area 
ActiveChart.PlotArea. Select 
With Selection.Border 
.Weight = xlHairline 
.LineStyle = xlNone 
End With 

Selection.Interior. Co lorlndex = xlNone 

'remove border on legend 
ActiveChart.Legend. Select 
With Selection.Border 
.Weight = xlHairline 
.LineStyle = xlNone 
End With 

Call formatChartForPrint(currentName) '(R4.ll, 

R4.12) 


'(R3.4) 

'(R3.6) 

'(R4.9) 


Sheets("" & currentName).Select 

Sheets("" & currentName).Name = chartName '(R3.1) 

End Sub 'CreateGraph 


Subroutine: createSummary 
Author: Matt Behnke 
Created: 9/17/01 

Description: 1) Creates a summary sheet showing cumulative entropy in each Time interval. 

2) calculates predicted values of cumulative entropy based on the trendline equation 
from the cumulative entropy graph. 

3) calculates the percent error btw actual and predicted values. 

inputs: datasheet - original data sheet name (co-occurance matrix from Tech OASIS) 
copyTo - sheetname of sheet containing calculated entropy values 
termStart - start of the terms (rows) 
sliceStart - start of time slice columns 
Outputs: none 
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Sub createSummary(ByVal datasheet As String, ByVal copyTo As String, ByVal termStart As 
Integer, ByVal sliceStart As Integer) 

termEnd = CountRows(dataSheet, 1) 
sliceEnd = CountCols(dataSheet, 3) 


Sheets("Sheet3").StandardWidth = 16 '(R5.10) 

Sheets("Sheet3").Cells(l, 1) = "Time T" '(R5.1) 

Sheets("Sheet3").Cells(l, 2) = "Slice" 

Sheets("Sheet3").Cells(l, 3) = "Cum Entropy (Actual)" 

Sheets(" Sheet3 ").Cells( 1, 5) = "Predicted: " & Chr(10) & "5 years of data" 

Sheets("Sheet3").Cells) 1, 6) = "Predicted: " & Chr(10) & "10 years of data" 
Sheets("Sheet3").Rows("l:l").RowHeight = 38.25 '(R5.ll) 

Count = 1 

'get the error formula from the power trendline 

firstPart = Sheets("Sheet3").Cells(l, 4).Characters(5, 5).Text '(R5.5) 

secondPart = Sheets("Sheet3").Cells(l, 4).Characters(12, 5).Text '(R5.6) 

For i = sliceStart To sliceEnd 
sliceName = Sheets)"" & dataSheet).Cells(termStart - 1, i) 

Sheets("Sheet3").Cells(i - 2, 1) = Count '(R5.2) 

Sheets("Sheet3").Cells(i - 2, 2) = sliceName '(R5.3) 

Sheets)"Sheet3").Cells(i - 2, 3) = Sheets)"" & copyTo).Cells(termEnd + 1, i) '(R5.4) 

Entropy = Sheets)"Sheet3").Cells(i - 2, 3) 

Sheets("Sheet3").Cells(i - 2, 4).Formula = "=((" & firstPart & "*A" & i - 2 & " A " _ 

& secondPart & ")-" & Entropy & ")/" & Entropy '(R5.7) 

Count = Count + 1 

'project 5 years '(R5.8) 

If Count <= 6 Then 

Sheets("Sheet3").Cells(i - 2, 5) = Entropy 
Else 


Sheets("Sheet3").Cells(i - 2, 5) = "=" & firstPart & "*" & "A" & i - 2 & " A " & secondPart 
End If 

'proj ect ten years '(R5.9) 

If Count <=11 Then 

Sheets("Sheet3").Cells(i - 2, 6) = Entropy 
Else 

Sheets("Sheet3").Cells(i - 2, 6) = "=" & firstPart & "*" & "A" & i - 2 & " A " & secondPart 
End If 

Next i 

Sheets("Sheet3").Select 

Range("D:D"). Select 

Selection. NumberFormat = "0.00%" 

Call formatSheetForPrint '(R12.4) 

End Sub 'createSummary 


' Subroutine: entropyLambda 
' Author: Matt Behnke 
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Created: 9/19/01 

Revised: 9/24 - added map k, k+1 stuff 

Description: 1) Creates a sheet called "entropy lambda" where lambda and the Lyaponuv 
number is calculated based on formulas given in the requirements 
inputs: summarySheet - name of the summary sheet 

copyTo - sheetname of sheet containing calculated entropy values 
termStart - start of the terms (rows) 
sliceStart - start of time slice columns 
Outputs: none 


Sub entropyLambda(ByVal summarySheet As String, ByVal copyTo As String, ByVal termStart 
As Integer, ByVal sliceStart As Integer) 

termEnd = CountRows(copyTo, 1) 
sliceEnd = CountCols(copyTo, 3) 

'create new sheet '(R6.1) 

Sheets.Add 

currentName = ActiveSheet.Name 
Sheets("" & currentName).Name = ("EntropyLambda") 
currentName = ActiveSheet.Name 

'set height of header row and standard column width 
Rows("l:l").RowHeight = 52 
ActiveSheet.StandardWidth = 12 

'copy first four columns of summary sheet 
Sheets("" & summarySheet).Select 
Range)"A1:D" & sliceEnd).Select 
Selection. Copy 
Range("Al").Select 
Sheets("" & currentName).Select 
ActiveSheet. Paste 

numRows = CountRows(currentName, 1) 

'copy object (equation objects that display below the data) from macro: 
workbookName = ActiveWorkbook.Name 
Active Window. SmallScroll Down:=-6 
Range("G4"). Select 
Active Window. SmallScroll Down:=-9 
Windows)" CumEntropyMacro2. xls"). Activate 
ActiveSheet.Shapes("Group 5").Select 
Selection. Copy 

Windows)"" & workbookName).Activate 
Range("A" & sliceEnd + 3).Select 
ActiveSheet.Paste 
Range("G31"). Select 

'headers '(R6.3) 

ActiveSheet.Cells) 1, 5) = "Cum_K+l" 

ActiveSheet.Cells(l, 7) = "du_(t-c)" 

ActiveSheet.Cells(l, 8) = "du_(t)" 

ActiveSheet.Cells(l, 9) = "du_(t-2)" 

ActiveSheet.Cells)!, 10) = "du_(t-5)" 


'(R12.6) 

'(R12.5) 

'(R6.2) 
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ActiveSheet.Cells( 1, 11) = "du_(t-15)" 

ActiveSheet.Cells) 1, 12) = "du_(t-20) M 
ActiveSheet.C ells( 1, 13) = "C_y_10%" 

ActiveSheet.Cells) 1, 14) = "Lambda_B10%_y" 

ActiveSheet.Cells) 1, 15) = "B10%_y" 

ActiveSheet.C ells) 1, 16) = "C_y_20%" 

ActiveSheet.C ells) 1, 17) = "Lambda_B20%" 

ActiveSheet.C ells) 1, 18) = "B20%_y" 

ActiveSheet.C ells) 1, 19) = "C_y_50%" 

ActiveSheet.C ells) 1, 20) = "Lambda_B50%" 

ActiveSheet.C ells) 1, 21) = "B50%_y" 

ActiveSheet.C ells) 1, 22) = "C_y_100" 

ActiveSheet.C ells) 1, 23) = "LambdaBlOO" 

ActiveSheet.C ells) 1, 24) = "B100_y" 

'fill in column 5: Cum_K+l '(R6.10) 

For i = 2 To numRows 
ActiveSheet.Cells(i, 5) = "=C" & i + 1 
Next i 

'create the map of k, k+1 to get the trendline equation for '(R6.11) 

'calculating the lyanponuv exponent. 

Call createMapEntropyKK l(numRows) 

Sheets)"" & currentName).Select 


'get formula of trendline from entropy power trend graph 
firstPart = ActiveSheet.Cells(l, 4).Characters(5, 5).Text '(R6.4) 

secondPart = ActiveSheet.Cells(l, 4).Characters(12, 5).Text '(R6.5) 

'get formula of trendline from entropy map k, k+1 
firstPartMap = ActiveSheet.Cellsfl, 6).Characters(5, 5).Text '(R6.12) 

secondPartMap = ActiveSheet.Cells(l, 6).Characters) 12, 5).Text '(R6.13) 

'rename column 4 header., in summary and entropy lambda sheets '(R6.3.4) 

Sheets)"Sheet3").Cells) 1, 4) = "Error (Act vs. Pred)" & Chr(10) & Sheets("Sheet3").Cells(l, 4) 
ActiveSheet.Cells(l, 4) = "Error (Act vs. Pred)" & Chr(10) & ActiveSheet.Cells(l, 4) 

'rename column 6 header '(R6.3.6) 

ActiveSheet.Cells(l, 6) = "Lyaponuv Exp J'(k,k+1) = " & Chr(10) & secondPartMap & & _ 

firstPartMap & " k A (" & secondPartMap & "-1)" 

'change column width of column 4, 6 (Error column) '(R12.8) 


With Worksheets)"" & currentName).Columns("D") 

.ColumnWidth =16 
End With 

With Worksheets)"" & currentName).Columns("F") 

.ColumnWidth =16 
End With 

'place m * b calculation of power trend in column 6 at the end of the data '(R6.6) 

ActiveSheet.Cells(sliceEnd + 2, 6) = "m*b" 

ActiveSheet.Cells(sliceEnd + 3,6) = "" & firstPart & "*" & secondPart 
ActiveSheet.Cells(sliceEnd + 4, 6) = "=" & firstPart & "*" & secondPart 


'place lyaponuv stuff below m*b stuff 
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'(R6.14) 



ActiveSheet.Cells(sliceEnd + 6, 6) = "J '(k,k+l)= " & secondPartMap & & _ 

firstPartMap & " k A (" & secondPartMap & "-1)" 

ActiveSheet.Cells(sliceEnd + 7, 6) = "" & firstPartMap & "*" & secondPartMap 
ActiveSheet.Cells(sliceEnd + 8, 6) = "=" & firstPartMap & "*" & secondPartMap 
ActiveSheet.Cells(sliceEnd + 9, 6) = "=" & secondPartMap & "-1" 

ActiveSheet.Cells(sliceEnd + 8, 7) = "J' coeff' 

ActiveSheet.Cells(sliceEnd + 9, 7) = "J' exponent" 

'fill in lyaponuv data in column 6 '(R6.15) 

jcoeff = ActiveSheet.Cells(sliceEnd + 8, 6) 
jexp = ActiveSheet.Cells(sliceEnd + 9, 6) 

For i = 2 To numRows 

ActiveSheet.Cells(i, 6) = "=" & jcoeff & "*C" & i & " A " & jexp 
Next i 

'place du equations in column 7 below data '(R6.7) 

ActiveSheet.Cells(sliceEnd + 2 , 1 )= "du = (" & firstPart & "*" & secondPart & ")*T A (" & _ 
secondPart & "-1)" 

ActiveSheet.Cells(sliceEnd + 3, 7) = "du_t-c = (" & firstPart & "*" & secondPart & ")*T_t-c A (" 

&_ 

secondPart & "-1)" 

'fill in the formulas for derivatives '(R6.8) 

For i = sliceStart - 1 To sliceEnd - 2 

ActiveSheet.Cells(i, 7) = "=" & "$" & col(6) & sliceEnd + 4 & "*$A" & i - 1 & " A (" & 
secondPart & "-1)" 

ActiveSheet.Cells(i, 8) = "=" & "$" & col(6) & sliceEnd + 4 & "*$A" & i & " A (" & secondPart 

& 


If i >= 4 Then 

ActiveSheet.Cells(i, 9) = "=" & "$" & col(6) & sliceEnd + 4 & "*$A" & i - 2 & " A (" & 
secondPart & "-1)" 

End If 

If i >= 7 Then 

ActiveSheet.Cells(i, 10) = "=" & "$" & col(6) & sliceEnd + 4 & "*$A" & i - 5 & " A (" & 
secondPart & "-1)" 

End If 

If i >= 17 Then 

ActiveSheet.Cells(i, 11) = "=" & "$" & col(6) & sliceEnd + 4 & "*$A" & i - 15 & " A (" & 
secondPart & "-1)" 

End If 

If i >= 22 Then 

ActiveSheet.Cells(i, 12) = "=" & "$" & col(6) & sliceEnd + 4 & "*$A" & i - 20 & " A (" & 
secondPart & "-1)" 

End If 

'fill in the values in columns 13-24 '(R6.9) 

'10% 

ActiveSheet.Cells(i, 13) = "=" & "$" & col(3) & i & & col(14) & i 

ActiveSheet.Cells(i, 14) = "=(" & "$" & col(15) & i & & col(7) & i & "/((" & col(15) & i _ 

& & col(7) & i & ")+" & col(8) & i & ")) A (l/3)" 
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ActiveSheet.Cells(i, 15) = 0.1 
'20% 

ActiveSheet.Cells(i, 16) = "=" & "$" & col(3) & i & & col(17) & i 

ActiveSheet.Cells(i, 17) = "=(" & "$" & col(18) & i & & col(7) & i & "/((" & col(18) & i _ 

& & col(7) & i & ")+" & col(8) & i & ")) A (l/3)" 

ActiveSheet.Cells(i, 18) = 0.2 

'50% 

ActiveSheet.Cells(i, 19) = "=" & "$" & col(3) & i & & col(20) & i 

ActiveSheet.Cells(i, 20) = "=(" & "$" & col(21) & i & "*" & col(7) & i & "/((" & col(21) & i _ 

& & col(7) & i & ")+" & col(8) & i & ")) A (l/3)" 

ActiveSheet.Cells(i, 21) = 0.5 

'75% 

ActiveSheet.Cells(i, 22) = "=" & "$" & col(3) & i & & col(23) & i 

ActiveSheet.Cells(i, 23) = "=(" & "$" & col(24) & i & & col(7) & i & "/((" & col(24) & i _ 

& & col(7) & i & ")+" & col(8) & i & ")) A (l/3)" 

ActiveSheet.Cells(i, 24) = 0.75 

Next i 

Call formatSheetForPrint '(R12.4) 

Call createLambdaChart 

End Sub 'entropyLambda 


Subroutine: createMapEntropyKK_l 
Author: Matt Behnke 
Created: 9/24/01 

Description: Called in entropyLambda, it creates the chart of entropy K and K+l 

gets the equation from the power trendline y=mx A b and puts it on the entropy 
lambda sheet so it can be used, 
inputs: number of rows of data on the entropy lambda sheet 
Outputs: none 


Sub createMapEntropyKK_l(ByVal numRows As Integer) 

Charts. Add '(R9.1) 

ActiveChart.ChartType = xlXYScatterSmooth 

ActiveChart.SetSourceData Source:=Sheets("EntropyLambda").Range("C2:C" & numRows - 1), 
PlotBy:=xlColumns 

ActiveChart.Location Where:=xlLocationAsNewSheet 

'change name and add axis labels 
With ActiveChart 

.HasLegend = True '(R10.4) 

.HasTitle = True 

.ChartTitle.Characters.Text = "Entropy Finite Difference Mapping Sk+l=f(Sk)" '(R10.1) 

.Axes(xlCategory, xlPrimary).FlasTitle = True 

.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Entropy Sk" '(R10.2) 

.Axes(xlValue, xlPrimary).FlasTitle = True 

.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Entropy Sk+1" '(R10.3) 

End W ith 
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'increase size of chart and move legend 


'(RIO.5, 


RIO.6) 

ActiveChart.PlotArea. Select 
Selection.Width = 598 
Selection.Height = 395 
ActiveChart.Legend. Select 
Selection.Left = 380 
Selection.Top = 300 

'increase legend size 
ActiveChart.Legend. Select 
Selection.Width = 201 

'remove border and color fill on plot area 
ActiveChart.PlotArea. Select 
With Selection.Border 
.Weight = xlHairline 
.LineStyle = xlNone 
End With 

Selection.Interior. Colorlndex = xlNone 

'remove border on legend 
ActiveChart.Legend. Select 
With Selection.Border 
.Weight = xlHairline 
.LineStyle = xlNone 
End With 

'change line style and marker points style 

RIO. 10) 

ActiveChart.SeriesCollection(l).Name = "Entropy Map S_k+1, S_k" 

ActiveChart. SeriesC ollection( 1). Select 
With Selection.Border 
.Weight = xlThin 
.LineStyle = xlNone 
End With 
With Selection 

.MarkerBackgroundColorlndex = 44 
.MarkerForegroundColorlndex = 45 
.MarkerStyle = xlTriangle 
.Smooth = True 
.MarkerSize = 6 
.Shadow = True 
End With 

'format axis and set the correct source data '(R9.3) 

ActiveChart. SeriesCollection( 1). Select 

ActiveChart.SeriesCollection(l).XValues = "=EntropyLambda!R2C3:R" & numRows - 1 & 

"C3" 

ActiveChart.SeriesCollection(l).Values = "=EntropyLambda!R2C5:R" & numRows - 1 & "C5" 
ActiveChart. Axes(xl Value). Select 
With ActiveChart.Axes(xlValue) 

.MinimumScale = 4 
.MaximumScalelsAuto = True 


'(R10.9, 
'(R9.2) 
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.MinorUnitlsAuto = True 
.MajorUnitlsAuto = True 
.Crosses = xlAutomatic 
.ReversePlotOrder = False 
.ScaleType = xlLinear 
.DisplayUnit = xlNone 
End With 

ActiveChart.Axes(xlCategory). Select 
With ActiveChart.Axes(xlCategory) 

.MinimumScale = 4 
.MaximumScalelsAuto = True 
.MinorUnitlsAuto = True 
.MajorUnitlsAuto = True 
.Crosses = xlCustom 
.CrossesAt = 4 
.ReversePlotOrder = False 
.ScaleType = xlLinear 
.DisplayUnit = xlNone 
End With 

'get current name of chart and add the trendline 
currentName = ActiveChart.Name 

Active Workbook. Charts)"" & currentName). SeriesCollection(l).Trendlines. Add 
'trendline details: 

With Charts("" & currentName).SeriesCollection(l).Trendlines(l) 

.Type = xlPower 
.DisplayEquation = True 
.DisplayRSquared = False 

'put trendline equation onto entropy lambda sheet 
Worksheets("EntropyLambda").Cells(l, 6).Value = .DataLabel.Text 
.DisplayRSquared = True 
End W ith 

'move trendline label 

ActiveChart.SeriesCollection(l).Trendlines(l).DataLabel. Select 
Selection.Left = 494 
Selection.Top = 198 

'format chart for print and change name of chart 
Call formatChartForPrint(currentName) 

'(RIO.11, RIO.12) 

Sheets("" & currentName).Select 

Sheets("" & currentName).Name = "Map Entropy K, K+l" 

End Sub 'createMapKKl 


Subroutine: createLambdaChart 
Author: Matt Behnke 
Created: 9/19/01 

Description: creates a chart based on the lambda calculations., plots three data 
series: 1) cumulative entropy, 2) lambda, 3) cum entropy - lambda 
inputs: none 


'(R9.4) 


'(R9.5) 


'(R9.1) 
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' Outputs: none 


Sub createLambdaChart() 

chartName = "EntropyLambdaChart" 
termEnd = CountRows("Entropy Lambda", 3) 

'add chart and set to XY smooth (R7.1) 

Charts.Add 

ActiveChart.ChartType = xlXYScatterSmooth 

'set first data series to use cumulative entropy (R7.3) 

ActiveChart.SetSourceData Source—Sheets("EntropyLambda").Range("$C$2:$C$" & termEnd), 
PlotBy:=xlColumns 

ActiveChart.Location Where:=xlLocationAsNewSheet 


'change title, axis labels.. 

With ActiveChart 

.HasLegend = True '(R8.5) 

.HasTitle = True 

.HasAxis(xlValue, xlPrimary) = True 
.HasAxis(xlValue, xlSecondary) = True 

.ChartTitle.Characters.Text = "Entropy (SB) f(Lambda, B)" '(R8.1) 

.Axes(xlCategory, xlPrimary).HasTitle = True 

.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "k (Years)" '(R8.2) 

.Axes(xlValue, xlPrimary).HasTitle = True 

.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Entropy Sk (Bits)" '(R8.3) 

End W ith 

'change name of dataseries 1 '(R7.2) 

'change line style and marker style dataseries 1 '(R8.8, 


R8.9) 

ActiveChart.SeriesCollection(l).Name = "Entropy (Information SH)" 

ActiveChart. SeriesC ollection( 1). Select 
With Selection.Border 
.Colorlndex = 1 
.Weight = xlMedium 
.LineStyle = xlDot 
End With 
With Selection 

.MarkerBackgroundColorlndex = 1 
.MarkerForegroundColorlndex = 6 
.MarkerStyle = xlTriangle 
.Smooth = False 
.MarkerSize = 9 
.Shadow = True 
End With 

ActiveChart.PlotArea. Select 

'add second data series '(R7-4, 

R7.5) 

ActiveChart.SeriesCollection.NewSeries 'data starts in row 2 column 11 

ActiveChart.SeriesCollection(2).Values = "=EntropyLambda!R2C13:R" & termEnd & "C13" 
ActiveChart.SeriesCollection(2).Name = "Entropy Constant C to relate S_H with S B 


(Lambda) 1 
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R8.ll) 


ActiveChart.SeriesCollection(2).Select 'format 2nd data series 


(R8.10, 


With Selection.Border 
.Weight = xlThin 
.LineStyle = xlNone 
End With 
With Selection 

.MarkerBackgroundColorlndex = 50 
.MarkerForegroundColorlndex = 4 
.MarkerStyle = xlSquare 
.Smooth = False 
.MarkerSize = 5 
.Shadow = False 
End With 

'add lambda b series '(R7.6, 

R7.7) 

ActiveChart.SeriesCollection.NewSeries 'lambda with b data is in row 2 column 12 
ActiveChart.SeriesCollection(3).Values = "=EntropyLambda!R2C14:R" & termEnd & "C14" 
ActiveChart.SeriesCollection(3).Name = "Lambda with B = 10%" 
ActiveChart.SeriesCollection(3).AxisGroup = 2 

ActiveChart.Axes(xlValue, xlSecondary).FlasTitle = True 'set title (R8.4) 

ActiveChart.Axes(xlValue, xlSecondary).AxisTitle.Characters.Text = "Lambda = f(fi, u)" 

ActiveChart.SeriesCollection(3).Select 'format 3rd data series (R8.12, 

R8.13) 

With Selection.Border 
.Weight = xlThin 
.LineStyle = xlNone 
End With 
With Selection 

.MarkerBackgroundColorlndex = xlAutomatic 
.MarkerForegroundColorlndex = xlAutomatic 
.MarkerStyle = xlAutomatic 
.Smooth = False 
.MarkerSize = 5 
.Shadow = True 
End With 

'move legend and adjust chart size (R8.6, 

R8.7) 

ActiveChart.PlotArea.Select 
Selection. Width = 598 
Selection.Height = 395 
ActiveChart.Legend. Select 
Selection.Left = 326 
Selection.Top = 207 

'increase legend size 
ActiveChart.Legend. Select 
Selection. Width = 201 


'remove border and color fill on plot area 
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ActiveChart.PlotArea. Select 
With Selection.Border 
.Weight = xlHairline 
.LineStyle = xlNone 
End With 

Selection.Interior. Colorlndex = xlNone 

'remove border on legend 
ActiveChart.Legend. Select 
With Selection.Border 
.Weight = xlHairline 
.LineStyle = xlNone 
End With 

currentName = ActiveChart.Name 
Call formatChartForPrint(currentName) 

R8.15) 


Sheets("" & currentName).Select 
Sheets("" & currentName).Name = chartName 

End Sub 'create lambda chart 


Subroutine: formatSheetForPrint 
Author: Matt Behnke 
Created: 9/19/01 

Description: formats the sheet to fit on one page wide (legal size paper) 

adds header and footer to each sheet and sets orientation to landscape 
inputs: none 
Outputs: none 


Sub formatSheetForPrint)) 

'column heading 

With ActiveSheet.PageSetup 
.PrintTitleRows = "$3:$3" 

.PrintTitleColumns = "" 

End With 

ActiveSheet.PageSetup. PrintArea = "$A$1 :$Y$203" 
With ActiveSheet.PageSetup 
.LeftHeader 

.CenterHeader = "&A in &F" 

.RightHeader = "" 

.LeflFooter = "&D" 

.CenterFooter = "Page &P of &N" 

.RightFooter = "" 

.LeftMargin = Application.lnchesToPoints(0.75) 
.RightMargin = Application.lnchesToPoints(0.75) 
.TopMargin = Application.lnchesToPoints(l) 
.BottomMargin = Application.lnchesToPoints(l) 
.HeaderMargin = Application.lnchesToPoints(0.5) 
.FooterMargin = Application.lnchesToPoints(0.5) 
.PrintHeadings = False 
.PrintGridlines = True 
.PrintComments = xlPrintNoComments 


'(R8.14, 


(R11.3) 


'(Rl 1.4) 
'(Rl 1.5) 
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.CenterHorizontally = False 
.CenterVertically = False 
.Orientation = xlLandscape 
.Draft = False 
.PaperSize = xlPaperLegal 
.FirstPageNumber = xlAutomatic 
.Order = xlDownThenOver 
.BlackAndWhite = False 
.Zoom = False 
.FitToPagesWide = 1 
.FitToPagesTall = 99 
End With 

End Sub 'format sheet for print 


Subroutine: formatChartForPrint 
Author: Matt Behnke 
Created: 9/19/01 

Description: puts headings and footers on charts, sets to landscape 
inputs: none 
Outputs: none 


Sub formatChartForPrint(ByVal chartName As String) 
Charts("" & chartName).Select 

With ActiveChart.PageSetup 
.LeftHeader 
.CenterHeader ="" 

.RightFIeader = "" 

.LeftFooter = "&D" 

.CenterFooter = "" 

.RightFooter = "&A in &F" 

.LeftMargin = Application.lnehesToPoints(0.75) 
.RightMargin = Application.lnehesToPoints(0.75) 
.TopMargin = Application.lnchesToPoints(l) 
.BottomMargin = Application.lnchesToPoints(l) 
.FleaderMargin = Application.lnchesToPoints(0.5) 
.FooterMargin = Application.lnchesToPoints(0.5) 
.ChartSize = xlFullPage 
.PrintQuality = 600 
.CenterFIorizontally = False 
.CenterVertically = False 
.Orientation = xlLandscape 
.Draft = False 
.PaperSize = xlPaperLetter 
.FirstPageNumber = xlAutomatic 
.BlackAndWhite = False 
.Zoom =100 
End With 

End Sub 'format chart for print 


' Function: CountRows 
' Author: ? Revised by: Matt Behnke 
' Created: ? 
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'(R11.6) 
'(Rl 1.1) 

'(R11.2) 







Revised: 9/10/01 

Description: Counts the rows in the suppiled worksheet and column number 
inputs: sheetName - name of the sheet to count the rows in 
colNum - number of the column to count rows in 
Outputs: number of rows as a double 


Function CountRows(ByVal sheetName As String, ByVal colNum As Integer) As Double 

On Error Resume Next 

Dim currCell As Range, rowNum As Double 

Sheets("" & sheetName).Select 

If IsNumeric(colNum) Then 
Else 

colNum = 1 
End If 

rowNum = 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 

Do While currCell. Value <> "" 
rowNum = rowNum + 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 

Loop 

CountRows = rowNum - 1 
End Function 'CountRows 


Function: CountCols 

Author: ? Revised by: Matt Behnke 

Created: ? 

Revised: 9/10/01 

Description: Counts the rows in the suppiled worksheet and column number 
inputs: sheetName - name of the sheet to count the columns in 
rowNum - number of the row to count columns in 
Outputs: number of columns as a double 


Function CountCols(ByVal sheetName As String, ByVal rowNum As Integer) As Integer 

On Error Resume Next 

Dim currCell As Range, colNum As Integer 

Sheets("" & sheetName).Select 

If IsNumeric(rowNum) Then 
Else 

rowNum = 1 
End If 
colNum = 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 

Do While currCell. Value <> "" 
colNum = colNum + 1 

Set currCell = ActiveSheet.Cells(rowNum, colNum) 

Loop 

CountCols = colNum - 1 
End Function 'CountCols 


' Function: cols 
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Author: Matt Behnke 
Created: 9/11/01 

Description: changes column number into a letter, 
inputs: columnNumber 
Outputs: column letter 


Function col(ByVal columnNumber As Integer) As String 

Select Case columnNumber 
Case 1 
col = "A" 

Case 2 
col = "B" 

Case 3 
col = "C" 

Case 4 
col = "D" 

Case 5 
col = "E" 

Case 6 
col="F" 

Case 7 
col = "G" 

Case 8 
col = "H" 

Case 9 
col = "i" 

Case 10 
col="J" 

Case 11 
col = "K" 

Case 12 
col = "L" 

Case 13 
col = "M" 

Case 14 
col = "N" 

Case 15 
col = "O" 

Case 16 
col="P" 

Case 17 
col = "Q" 

Case 18 
col = "R" 

Case 19 
col="S" 

Case 20 
col = "T" 

Case 21 
col = "U" 

Case 22 
col = "V" 

Case 23 
col = "W" 
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Case 24 
col = "X" 
Case 25 
col = "Y" 
Case 26 
col = "Z" 
Case 27 
col = "AA" 
Case 28 
col = "AB" 
Case 29 
col = "AC" 
Case 30 
col = "AD" 
Case 31 
col = "AE" 
Case 32 
col = "AF" 
Case 33 
col = "AG" 
Case 34 
col = "AH" 
Case 35 
col = "AT 
Case 36 
col = "AJ" 
Case 37 
col = "AK" 
Case 38 
col = "AL" 
Case 39 
col = "AM" 
Case 40 
col = "AN" 
Case others 
col = "Z" 
End Select 

End Function 'col 
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APPENDIX F INSPEC DATABASE FIELDS 


INSPEC records are divided into the following fields, listed in alphabetic order. Highlighted fields are limit 

fields . 

AA Author Affiliation 
AB Abstract 

Al Astronomical Object Indexing 
AN Accession Number 
All Author 
AV Availability 
CC Classification Codes 
CD Conference Details 
Cl Chemical Indexing 
CL Copyright Clearance Center Code 
CO CODEN 

CP Country of Publication 

CS Copyright Statement (*) 

DE Descriptors 
DN Document Number (*) 

DOI Digital Object Identifier (*) 

DS Dissertation Submission Date 
DU Document Collection URL (*) 

ED Editor 
IB ISBN 
ID Identifiers 
IS ISSN 
LA Language 

MD Description of Unconventional Medium 
MN Material Identity Number 
Nl Numerical Data Indexing 
OP Original Patent Details 
PA Patent Assignee 

PD Patent Details 

PF Patent File Date 

PI Patent Priority Date 
PR Price 

PY Publication Year 
RF Number of References 

RN Report Numbers 

RT Record Type 

SC SO (*) 

SF Subfile 

SK Sort Key 

SO Source 

ST SICI of Translation (*) 

SU Subject Terms (DE and ID) 

Tl Title 
TL Translator 

TR Treatment Codes 
UD Update Code 

UR Universal Resource Locator (*) 

(*) This field is for display only; you cannot search in this field. 


Figure F-l. (“Fields”, 2001) List of INSPEC database fields and descriptions. 
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APPENDIX G LEARNING CURVE 


Then analysis of the trends of number of publications (messages) N vs. time step k 
is developed in an independent approach. The community, macro level, publication data, 
Ni represents counts of publications (messages) in the partitions (A, B, C, D) coarse¬ 
grained bands N ^, N^ . 

The power form of the learning curve is explored. The learning rates for each 
band is developed as a perfonnance index as a function of tasks (messages) perfonned 
over time steps which is the relationship that is expected in learning. That is that 
performance improves with the increase in the number of tasks performed. Then, in a 
stepwise fashion, entropy is introduced into the learning curve equation, showing how the 
complexity of the messages being processed in a technology transition task affects the 
perfonnance index. This is then related to the two-dimensional map of a dynamical 
system. 


1. Capacity 


We will compute the organizational capacity in a band as the number of messages 
processed on average over the time steps to date. We look at an organization production 
of messages. The organization messages produced are allocated to the number of authors 
in order to get the average number of messages per author per time step. This is done by 
organizational bands. We observe the apparent capacity of the organizations in the “A” 
band (the best performers by cumulative messages produced) and allocate it to the 
number of authors. Now we have what could be considered, the property of the best 
capacity available in the channel. 

In the entropy learning curve model, we use this as the best perfonnance we 
might expect. It is well accepted that an individuals performance, in terms of tasks per 


-443 - 



unit time, improves through learning as a function of the number of times the task is 
performed. (Mazur 1978) (Newell 1981). So the more times, N, that a task is perfonned, 
the tasks per time step performance index improves. We observed this in these models as 
well. An important part of this research is to develop the relationship between tasks 
perfonned in a time step by an author (on average) and the complexity of the message. 

To bridge the gap between communication theory and capacity of human 
performance an analogy is made between, a human accepting input and generating output 
and a communication system. This is seen as the overlap in a Venn diagram. The input 
variance is represented by the circle to the left, and the output variance is the circle to the 
right, and at the intersection is the amount of transmitted information. Miller (Miller 
1956) suggests that an individual is a communication channel. He states for a human, 
“when we increase the amount of input infonnation, the transmitted information will 
increase at first and will eventually level off at some asymptotic level.” He indicated that 
this is the channel capacity of the observer, the human. We also see that there is a 
capacity and that the performance levels off. A further discussion on this is found in 
section Appendix G Learning Curve, (p443). 

2. Pressure 

Let’s establish a conceptual framework for pressure. Imagine a physical system 
with a channel made up of a number of garden hoses, with each hose having a finite cross 
sectional area. We can denote pressure in terms of pounds per cross sectional area, or 
pounds 1 per square inch say. If the hoses in a band were treated on average as the same 
size, we could indicate pressure in pounds per hose. This could be stated in pounds per 
channel. We might say the pressure is some force measure per node if the channel was 
made up of a collection of nodes strung together in a kind of a graph. All the terms in a 
given state and node ensemble is state space represent the volume. 

1 In this illustration, we are using the engineering sense of pounds mass. Recall that force is 
proportional to the second derivative of a length, a step /, with respect to time. F d 2 l/dt 2 . The important 
piece to notice is in the math here, not whether we have the right units on the force or not. <x, the 
proportionality constant is mass in Newton’s equations. For our purposes, let’s not think in terms of force 
and mass which is related to gravitation and our physical world, but look at the mathematical meaning and 
see the proportionality constant. For convenience, we will call it a mass, actually a probability mass. 
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We almost have enough information theory to understand the models. Consider 
the entropy as a representation of the terms in a vocabulary, which are available to the 
researchers in a time step. A researcher reaches into the pool of messages, which are 
constituted by terms. We can compute entropy contribution of a term in a given time step 
as a function of p(x) for the term. Summing all of the terms’ entropy contribution, we 
have the entropy at time step k. 


A Band Productivity In Pubs (Cum over k) 



-445 - 

















A Band Productivity Index (Cum over k) 



Learning Curve -- A Band (Mean and Capacity) 



Since we have the affiliated publisher 
performance indicator per capita of the set for ^ 


information, we can find a 
the in the organization band as 
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described in the section on technology transfer system elements. At each time step, we 
can determine the maximum (on average) capacity per capita in a band. This will yield a 
set of capacity productivity curves representing the community learning in bands. This 
approach yields a learning curve, which is an average for the set of performers measured 
in the data set. This then has the individual based description of learning within a 
population of learners. An individual based approach views the organization as a 
population of learners, with organizational learning is a sum of the individual behaviors. 
This establishes criteria for the perfonnance indicator for capability and experience in the 
N dimension. The next time step the process is repeated to provide A*. This is repeated 
for n time steps, where n is the upper bound over the range of data being examined. This 
builds a moving distribution with a time varying perfonnance indicator criteria. 

While it is tempting to relate the Rogers 1983 adopter profile, we can not do this 
directly with the data as presented. If the perfonners are ordered in the time step when 
they first appear, then the true innovators, early adopters, early majority, and late 
majority can be identified. We also do not expect to find the laggards publishing. 

The data must therefore include the term count, entropy by term (a calculated 
value) and publication rate for author and affiliation allocated to an accession number 
(AN). The accession numbers are allocated to bins. These can be a year, a month (year 
AN ranges divided by 12) or weekly (year AN numbers divided by 50, since there are 50 
updates to the IEEE database per year). While the time step k, is set by the bin size the 
interval of meaning is k-c. Where c is the number of time step that improves 
convergence of a feedback model. For example, if the bins are weekly, we take a year 
offset to publish, request clarification and another year (from a publishing cycle) to have 
the request for clarification be received in a published message. 


Nembhard and Uzumeri (Nembard 2000) studied twelve learning curves. They 
found exponential and hyperbolic learning curves are the best suited for mixed perceptual 
and motor learning. The curves analyzed are discussed here for reference. These 
represent the major contributions of the underlying learning curve research. 
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They compared models for aggregation, and individual learning. Aggregation 
implies that you can sum up individual learning and have a representation of 
organizational learning. Although it is possible to derive lower level information from 
aggregated data, it is generally difficult to disaggregate organizational level learning into 
smaller organizational units where the workplace interventions and changes are actually 
implemented. It is also difficult to separate the learning effects from the effects of other 
internal and environmental effects (Nembhard 2000). They note that organizational 
(aggregate) learning curves are best used for measuring organizational improvements 
over time. They also looked at models that would be appropriate for both individual and 
aggregation, referred to as the combined model. These are summarized below with the 
number in parenthesis indicating the goodness of the model as found by Nembhard. 


Aggregation models which pennit taking learning measures at the 
individual level and aggregations of those measures represent 
organizational level reality) 

• 3) DeJong’s learning fonnula (DeJong 1957) 

• 4) Stanford B model (Asher 1956) 

• 5) Log linear (Wright 1936) 

• 6) S-Curve (Carr 1947) 

• 10) Levy’s function (Levy 1965) 

Individual models pennitting measures at the individual level, but not 
necessarily being able to aggregate to a meaningful organizational 
aggregate. 

• (2) exponential functions (two and three parameter) (Mazur 1978) 

• (1) hyperbolic functions (two and three parameter) (Mazur 1978) 

Combined models permitting accounting for empirical data observations 
in learning data. 

• (8) Pegels’function (Pegels 1969) 

• (11) Knecht’s model (Knecht 1974) 

• (2) exponential functions (two and three parameter) (Mazur 1978) 


It is useful to present some of the basics of the power law. 
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T = BN~ a 

(G.l) 

or in log-log form 


log(T) = log(5) - alog(A0 

(G.2) 


Where N is the number of trials and T is the time it takes to perform a task, -a is 
the slope and B is the offset reflecting prior experience or trials. Looking at this in terms 
of the rate of local learning, dT/dN, we see 

— = -ccBN~ a ~ l (G.3) 

dN 


We know that one form of learning is exponential. It can arise from any 
mechanism that is completely local. Therefore, if there is something that learns on each 
local part of performance, independent of any other part, then the change in T (the sum of 
the changes to each part of T) is proportional to T: 


dT 

dN 


T=Be 


= -aT 


(G.4) 

(G.5) 


Comparing this differential fonn with the power law, shows that the power-law is 
like exponential learning, in which the instantaneous rate a decreases with N, that is, 

dT 

- = -aT (G.6) 

dN 

where a = aIN 


The three parameter hyperbolic is given here in more detail since the variables in 
can seen from this fonn. This is also the best model for describing learning across 
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populations of individuals. The plots of the hyperbolic and exponential ignore prior 
learning p=0 for 


x+ p 

y/ = K --— 

^ x + p + r 

such that y/, K,p,x> 0, and 

p + r)0 


(G.7) 


y/ is the measure of work performance, and x is the amount of cumulative work in 
units of time or number of trials (messages in the case of this research). The parameter k 
provides an estimate of the asymptotic limit or maximum perfonnance level that can be 
expected when all learning has been completed. The upper bound on K (kappa) comes 
from a distribution of workforce perfonnance. In this research, we assume the originator 
of the technology (the advocate) could do. For example, assume the SEI is the most 
prolific on a technology in a given time step. So if the SEI publishes usei messages in a 
given time step, then K =1/ u S ei ■ Parameter r is the cumulative production required in 
order to attain an output level of k/2 and represents the rate at which productivity 
converges toward K Small values of r indicate that learning occurs rapidly relative to K. 
The value of r may also be small if the publishing unit reaches steady state limit. This 
can happen quickly with prior experience, p represents the individual performing 
activity’s accumulated prior experience on a time or a cumulative messages basis. The 
prior experience may be acquired from the work on similar tasks (messages) and 
interpreted as the point on the learning curve where the unit is resuming the learning 
process. 

Note that the denominator ( x+p+r) must be non zero. Since cum tasks or time is 
positive, this implies p>-r. The model’s first and second derivatives are: 
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dip 

kx 

(G.8) 

dx 

(x + p + r) 2 

d 2 ip 

-2 kx 

(G.9) 

dx 2 

(x + p + r) 2 


In order to illustrate the general shape of the learning curves the hyperbolic and 
exponential forms are plotted in Figure G-l. Figure G-l is a plot of a three parameter 
hyperbolic learning curve with one parameter p for prior learning set to 0. Figure G-2 is 
a plot of a three parameter exponential learning curve, also with curve with one 
parameter p for prior learning set to 0. The parameter p>0 shifts the curves to the left by 
the amount of p, prior tasks performed 


Hyperbolic Learning Curve (3 parameter, p=0) 



Figure G-l Flyperbolic Learning Curve (3 Parameter). 
(Source: after Nembhard 2000) 
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Exponential Learning Curve (3 parameter, p=0) 



Figure G-2 Exponential Learning Curve (3 Parameter). 

(Source: after Nembhard 2000) 


3. Trials and Time Relationship 

The basic law of practice is of the form of a power law (G.l), and has also shown 
below in log-log form (G.2) 

The form of the law of practice is performance time (T) as a function of trials (N). 
However, trials are simply a way of marking the temporal continuum (7) into intervals, 
each one performance-time long. Since the performance time is itself, a monotone 
decreasing function of trial number, trials (AO becomes a nonlinear compression of time 
(, t ). It is important to understand the effect on the law of practice by viewing it in terms 
of time or in terms of numbers of trials. 

The control algorithm has the number of messages processed without requiring a 
request for feedback as f(x0. This is the number of messages (trials) input at time step k. 
The fundamental relationship between time and trials is obtained as follows: 
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(G.10) 


«N) = T 0 + £ i T i = T 0 + , £ i Bi- a = T 0 + B-'£r 


1=1 


i=i 


To is from the arbitrary time origin to start the first trial. This equation cannot be 
inverted explicitly to obtain the expression for N(t) that would pennit the basic law 
(Equation 0-1) to be transformed to yield T(t). Instead, we proceed indirectly by means 
of differential form. From we obtain 


dT 

dN 


T 


(Gil) 


Using the following integral fonnulation 

£ fW dx = f 

Now starting with the power law in terms of trials we find 


dT _ dT/dN _ -ccT/N _ -a 
dN ~ dt/dN ~ T ~ N 


But from (G. 1) we get 
N = I 




KB; 


N ■ 


Ce B 

B 


-\/a 


f -1 


a 


1 

e B a 


where C l is 


fC) 

[b) 


a 


When a =1 

dT_ _ _}_ T 
dt B 

By solving the differential equation, we get 


(G.12) 

(G.13) 

(G.14) 

(G.15) 

(G.16) 
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(G.17) 


T=Ce 


1 

- 1 

B 


When a we have a polynomial that we can integrate, where C is an arbitrary 
constant of integration and if the origin and scale of t is adjusted properly and get 
=(i- a )B- lla t + C (G.18) 

So, we can obtain the trials power law re-expressed in terms of time: 

— = -aB ya T ya (G.19) 

dt 


Rearranging 


dT 

rj-iX/cC 


-aB l/a dt and integrating both sides, we get 


]—^—dT l - l/a = ]-aB l,a dt 

0 1 __ 0 

a 

a — 

- T a +C= -aB~ ya t + C 2 

a-l 

adjust the constants of integration equal to 0 
r/ — 

- T a =-aB~ l/a t 

a -1 

rearranging we get 

a -1 

T «“ = (1 -a)B~ ya t 

a -1 a -1 

T = [(l-a)B- Va t\^t^ 


(G.20) 

(G.21) 

(G.22) 

(G.23) 

(G.24) 


which has a constant as the coefficient and we can write it as B’ 

a -1 

T = B't~" for a * 1 (G.25) 

This is similar to (1.26) with N given as a function of time. Rewriting we see 
T = B'N~ a (G.26) 

Therefore, we now have two possibilities for T 
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-a - 

B't 

Ce~? 


a* 1 

a = 1 


(G.27) 
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APPENDIX H ANNOTATED BIBLIOGRAPHY TECHNOLOGY 

TRANSFER 


TECHNOLOGY TRANSITION ANNOTATED BIBLIOGRAPHY 

This annotated bibliography contains complete bibliographic citations of most of 
the relevant technology transfer literature for software engineering. It does not include 
experience reports, or case studies in general. Some from IBM, HP and a few other 
notable (Cleanroom) studies are included due to the extensive study on transitioning 
those technologies. 

Most of the citations include a category and keywords. At the end of this 
annotated bibliography is the 1988 paper by Przybylinski. This paper provided a data set 
to explain entropy in Chapter III. The current bibliography will be updated and posted on 


the SEI web site in 2002. 

(Abetti 1995) Abetti, Pier A. and 
Robert W. Stuart. “Entrepreneurship and 
Technology Transfer: Key Factors in the 
Innovation Process,” in Donald L. Sexton and 
Raymond W. Smilor (editors), The Art and 
Science of Entrepreneurship, chapter 7, pages 
181-210. Ballinger Publishing Company, 
Cambridge MA, 1985. 

Category: technology transfer 

Key Words: innovation, 

entrepreneurship 

Abstract/Summary: This paper 

provides a linear model of the technological 
innovation process, with two case studies (non¬ 
impact magnetic printer and extra-high voltage 
transformers) defined in terms of that model. 
The cases focus on the importance of the 
different roles in the innovation process, e.g., 
gatekeepers, champions, etc. 

Referenced by (Przbylinski 1988) 

(Adrion 1994) Adrion-WR; McOwen- 
P, “A Three-Pronged Strategy For Technology 
Creation, Transfer And Absorption,” in Levine, 
Linda, ed., proceedings of the 1FIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 


SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.309-20 

ABSTRACT: The Computer Science 
Department of the University’ of Massachusetts, 
Amherst has developed a strategy’ for research, 
development, industrial interactions and 
technology transfer called the "Three Pronged 
Strategy (TPS)". The principal components 
within the Three-Pronged Strategy are: 
continuing programs of education and 
fundamental research in computer science within 
the Computer Science Department of the 
University of Massachusetts, Amherst; a 
program of focused, or "problem-driven" basic 
research within the Center for Real-Time and 
Intelligent Complex Computing Systems 
(CRICCS); and a program of applied research 
and development and technology transfer within 
the Applied Computing Systems Institute of 
Massachusetts (ACSIOM). In this report, we 
discuss the motivation and development of the 
TPS and our experiences to date. We describe 
each of the components of our strategy’ and 
suggest how these might be adapted to other 
environments. 

REF: 0 

(Allen 1977) Allen, Thomas John. 
Managing The Flow Of Technology: Technology 
Transfer And The Dissemination Of 



Technological Information Within The R&D 
organization., The MIT Press, Cambridge, MA, 
1977. 

Category: communication 

Key Words: communication, 

dissemination 

Abstract/Summary: Allen's book 

summarizes his detailed study of communication 
processes and their impact on the technology 
development process in a R&D environment. His 
work has implications on topics such as 
technical publishing, human resource 
development and office design. 

Referenced by (Przbylinski 1988) 

(Ardis 1994) Ardis, M.A.; Furchtgott, 
D.G., “Research and development: differences 
are barriers to transfer,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-T ransactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.245-7 

ABSTRACT: We have discovered 

several differences betiA’een research and 
development that frustrate attempts to introduce 
new software technology into development. For 
each of these differences we have found 
strategies that either reduce the difference or 
mitigate its effects. 

REF: 1 

(Bailey 1982) Bailey, Claudia Lynn. 
“Technology Transfer: A Compilation of Varied 
Approaches to the Management of Innovation,” 
Master's thesis, Naval Postgraduate School, 
December, 1982. 

Category: technology transfer 

Key Words: innovation, technology 
management 

Abstract/Summary: This masters 

thesis from the Naval Postgraduate School 
provides abstracts of many technology transfer 
references available during that period. 

Referenced by (Przbylinski 1988) 

(Barrett 1984) Barrett, Edgar and 
Donna Bergstedt. “The System Texas 
Instruments Developed To Manage Innovation,” 
International Management 0:81-87, May, 1984. 


Category: innovation 

Key Words: technology management, 
technology planning, strategic planning 

Abstract/Summary: This article was 
condensed from a case study prepared by 
Professor Barrett from Southern Methodist 
University. It details the Objectives, Strategy and 
Tactics (OST) system, a layered planning system 
in place at Texas Instruments. OST includes (1) 
a hierarchical goal system; (2) dual 
responsibility (strategy development and 
operations) of line management; and (3) analysis 
of the impacts of a matrix organization on these 
strategic and operating modes. Goals flow from 
high level business objectives to strategies to 
Tactical Action Programs (TAPs), where they 
are implemented on the business unit level. 
About 75% of TVs managers wear both strategic 
and operating which TI believes forces them to 
do long-range thinking. (The full case is 
available from Case Publishing, 46 Glen Street, 
Dover, Massachusetts, 02030.) 

Referenced by (Przbylinski 1988) 

(Bass 1994) Bass L; Soule A, 
“Technology Transition Of User Interface 
Management Systems,” in Levine, Linda, ed.. 
Proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.357-68 

ABSTRACT: This paper presents a 
case study of the transition efforts associated 
with an advanced user interface tool. The tool 
(Serpent) was well received scientifically, 
leading to efforts to influence the standards 
community, to commercialize a Serpent product, 
and to formulate a special purpose consortium. 
The results of these efforts are reported. 

REF: 13 

(Bayer 1989) Bayer, Judy and Melone, 
Nancy, “A Critique of Diffusion Theory as a 
Managerial Framework for Understanding 
Adoption of Software Engineering Innovation,” 
0164-1212/89 IEEE, pp. 161-166, 1989. 

REF: 13 
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(Besselman 1994) Besselman-J., 
“Position Statement On Software Process 
Innovations And Informal Organizational 
Networks,” in Levine, Linda, ed., proceedings of 
the IFIP TC8 Working Conference on Diffusion, 
Transfer and Implementation of Information 
Technology, Software Engineering Institute, 
Carnegie Mellon Institute, Pittsburgh, PA, North 
Holland, Amsterdam,, London, New York, 
Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.321-5 

ABSTRACT: The Software 

Engineering Institute (SEI) at Carnegie Mellon 
University spawned the software process 
improvement industry about six years ago 
(1988), with their initial version of the capability 
maturity model (CMM) for software. This 
position statement outlines the author's research 
agenda after reviewing many software 
development organizations over the last few 
years. Most software development organizations 
are engaged in some type ofprogram of software 
process improvement. The inattention paid to the 
informed organization is identified as a weakness 
in many of these software process improvement 
programs. Additionally, a decomposition of what 
constitutes a software process innovation is 
presented as a precursor for developing a 
research model of process innovations covering 
all software development activities. 

REF: 16 

(Bihari 1994) Bihari, T.E.; Varner, 
M.O., “Practical Issues In Information 
Technology Transfer,” in Diffusion, Transfer 
and Implementation of In formation Technology, 
in Levine, Linda, ed., proceedings of the IFIP 
TC8 Working Conference on Diffusion, Transfer 
and Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.369-72 

ABSTRACT: Adaptive Machine 

Technologies, Inc. (AMT) is an engineering 
research and product development company 
located near the Ohio State University (OSU). 
AMT's strengths are primarily in the areas of 
software and electrical engineering. Since 1984, 
they have been working with OSU personnel on 


various projects. Over the last five years (1989- 
94), AMT has broadened its line of business to 
include commercial product development, in 
partnership with other companies and as 
contractors. AMT frequently works at the 
boundary between university research and 
commercial product development. In that 
position, they have witnessed and been involved 
in a number of projects that fall under the 
umbrella of "university-industry technology 
transfer". Some were official programs but many 
others consisted of general cross-fertilization 
between academics and practitioners working in 
the same application domains. For several years, 
AMT has been working with the OSU Center for 
Mapping (CFM) on projects in the GPS/GIS 
area. CFM collaborates with private sector 
companies like AMT in an attempt to many the 
intellectual capital of the university with the 
market discipline of the private sector. The CFM 
is an interesting place to study technology 
transfer because, unlike the university, their sole 
mission is to transfer technology’. The authors 
present some general observations and 
suggestions, based on experiences with 
university -industry’ technology transfer at AMT 
and the CFM. 

REF: 1 

(Bikson 1985) Bikson, Tora K., 
Catherine Stasz and Donald A. Mankin, 
.’’Computer-Mediated Work: Individual and 
Organizational Impact in One Corporate 
Headquarters, ” Final Report R-3308-OT A, 
Rand, November, 1985. 

Abstract/Summary: This Rand study 
focused on technology characteristics that 
enhanced the adoption of office automation 
technologies. 

Referenced by (Przbylinski 1988) 

(Borton 1994) Borton, J..M.; 
Brancheau,J.C., “Does An Effective Information 
Technology Implementation Process Guarantee 
Success?” in Diffusion, Transfer and 
Implementation of Information Technology, in 
Levine, Linda, ed., Proceedings of the IFIP TC8 
Working Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.159-78 
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ABSTRACT: A model of the IT 

adoption and implementation process is 
described. The model integrates empirical 
information system (IS) research with concepts 
from three theories originally developed in 
referent disciplines. The model is used to guide a 
longitudinal case study through sixteen months 
of qualitative and quantitative data collection. A 
qualitative analysis of the data is presented 
describing the implementation process using a 
temporal (chronological) format. The analysis 
shows that a strong implementation process 
within a supportive environment can overcome 
weaknesses indicated by some of the 
implementation factors. In addition, the effect of 
the interaction of factors within and among the 
stages of the process is clari fied, and the cyclical 
nature of the implementation process is validated 
in this research context. This study makes two 
primary’ contributions to IS research and 
practice. First, the study demonstrates that a 
longitudinal research design combined with a 
mixed quantitative/qualitative data collection 
approach can provide a rich base of data to use 
in examining the IT adoption and 
implementation process. Second, the research 
provides support for the development of a theoiy 
-based model to guide managers in the planning 
and control of new IT installations. 

REF: 24 

(Brownswood 1994) Brownswood L., 
“Applying Technology Transition In Large 
Software Organizations,” in Levine, Linda, ed., 
Proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.373-6 

ABSTRACT: The author profiles 

consumer organizations that successfully 
transition significant software engineering 
technologies. The success elements are derived 
from the author's experience supporting seven 
software organizations which develop and 
maintain large software systems in the general 
command and control application domain. These 
organizations included three United States 
government contractors, three government 
contractors in Europe and Australia, and a 
United States government agency. The 


technologies these organizations have attempted 
to transition include software engineering, reuse, 
object-oriented technology, Ada, computer aided 
software engineering tools, software 
measurement programs, and continual process 
improvement. The process maturity was typical 
for software organizations of the late 1980's and 
early 1990's. Most organizations had some level 
of defined process, although the formality of 
definition and usage varied. 

REF: 0 

(Buxton 1991) Buxton, J.N. and 
Malcolm, R., Software Technology Transfer, 17- 
23. 

Category: Transferring of Technology 
Between Businesses 

Key Words: Technology, Participation, 
Complex 

Abstract/Summary: Software 

technology transfer is long and complex between 
businesses. There are two aspects for any 
technology to be transferred. First it must be 
possible to estimate its value in the client 
organization and second the client organization 
must be mature and understand the use of the 
technology. The process of transferring requires 
the participation of many people (i.e. suppliers, 
management, gatekeeper, workers, etc.), 
throughout many unbroken phases (awareness 
of needed technology, decision making, and 
adaptation for use), otherwise the outcome will 
not satisfy the client organization. 

REF: 8 

(Childers 1986) Childers, Terry L., 
“Assessment of the Psychometric Properties of 
an Opinion Leadership Scale,” Journal of 
Marketing Research , XXIII pp. 184-188, May, 
1986. 

Category: innovation 

Key Words: opinion leader, diffusion 
of innovations 

Abstract/Summary: The concept of 
opinion leadership is central to the study of 
the diffusion of innovations. In this article, 
the author discusses existing efforts to 
develop a tools for measuring opinion 
leadership. The paper goes on to describe a 
study in which a modified opinion leadership 
scale (i.e., based on the King and Sommers 
self-designating scale) is shown to have 
higher internal consistency reliability. 

Referenced by (Przbylinski 1988) 
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(Christian 1994) Christian-JT; Eward, 
M.M., “Transferring Software Engineering 
Technology: The Software Productivity 

Consortium Experience,” in Levine, Linda, ed., 
Proceedings of the IF1P TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.377-80 

ABSTRACT: In 1991, the Software 
Productivity Consortium (the Consortium) 
rapidly expanded usage of Consortium products 
by its member companies from fewer than 10 to 
nearly 100 uses. The Consortium accomplished 
this by adopting a view of technology transfer as 
a people-to-people activity, a contact sport. 
Engaging in this contact sport requires applying 
a matrix approach to transferring technology 
that is geared to meeting both common problems 
of the member companies and unique, individual 
information and support needs of member 
company staff The matrix approach transfers 
each technology through a diverse set of 
products and services, cooperative interactions 
with all member company staff levels, and 
internally-set expectations for transfer 
performance and product quality. 

REF: 0 

(Clapp 1988) Clapp, Judith, 
“Government/industry interaction in Ada 
software engineering tool technology transfer”, 
TH0218-8/88 IEEE p 67-69 

Category: Ada Software Technology 

Transfer 

Key Words: Ada, program managers, 
government, standard interface, compiler 

Abstract/Summary: The government 
funded the design of Ada ten years ago when no 
other language was found suitable. The 
government made Ada required for certain 
systems and made it difficult for program 
managers to obtain waivers. To counteract the 
risk of tools not operating correctly the 
government has a validation process for the 
compiler. In conclusion, the transfer has been 
difficult partly because Ada was forced in 
through mandates. Risk reduction and feedback 
is necessary> and the link to the technology 
transfer. 


REF: 0 

(Cohen 1994) Cohen, Wesley M. and 
Levinthal, Daniel A., “Fortune Favors the 
Prepared Firm:, Management Science, Vol. 40., 
NO. 2, February 1994. 

REF: 52 

(Cohn 1980) Cohn, Steven F. and 
Romaine M. Turyn. “The Structure of the Firm 
and the Adoption of Innovations,” IEEE 
Transactions on Engineering Management EM- 
27(4):98-102, November, 1980. 

Abstract/Summary: This paper 

describes the impact of organizational structure 
on innovativeness. Their hypotheses were that 
adoption varies directly with firm complexity and 
inversely with centralization and formalization. 
It contains a number of references to other work 
in this area. 

Referenced by (Przbylinski 1988) 

(Creighton 1972) Creighton, J. W., J. 
A. Jolly, and S. A. Denning, “Enhancement of 
Research and Development Output Utilization 
Efficiencies: Linker Concept Methodology, in 
the Technology Transfer Process,” Scientific, 
Interim AD-756 694, Naval Postgraduate 

School, June, 1972. 

Category: technology transfer 

Key Words: adoption, innovation, 

linker 

Abstract/Summary: Creighton et cil 
studied the characteristics of potential 
technology adopters and their organization to 
build a regression model of technology transfer 
process. Its variables consider innovation, 
motivational and communication aspects. 

Referenced by (Przbylinski 1988) 

(Culver 1994) Culver, Lozo K, 
“Process engineering support for technology 
transfer: strategy and experiences,” in Levine, 
Linda, ed., proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.327-31 

ABSTRACT: A software engineering 
process group (SEPG) can speed the transfer of 
technology to software development 
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organizations. By identifying stable technology 
that addresses the critical software development 
needs of an organization, the SEPG can reduce 
the costs and risks of adopting innovations. The 
SEPG can also represent the software 
development needs of the development 

organization to technology’ providers in order to 
promote work focused on solving key 

development challenges. 

REF: 1 

(Damsgaard 1994) Damsgaard, J.; 
Rogaczewski. A.; Lyytinen X., “How 

Information Technologies Penetrate 

Organizations: An Analysis Of Four Alternative 
Models” in Diffusion, Transfer and 
Implementation of Information Technology’, in 
Levine, Linda, ed., proceedings of the IFIP TC8 
Working Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.1-21 

ABSTRACT: We analyze 

investigations to explain information technology 
penetration processes. A framework is presented 
which serves as a common background for 
exploring four IT penetration models discussed 
in the literature. The framework strives to unify 
theoretical accounts to explain IT penetration 
processes by recognizing six major issues which 
need to be addressed in any model seeking to 
explain IT diffusion. These are: penetration level 
identification criteria, qualitative differences 
between levels, explanative content of the model, 
items of penetration, assumed caused structure 
and underlying theory’. The framework is applied 
to analyze the following four IT penetration 
models: Nolan's stage theory’ (1973), Attewell's 
IT diffusion model (1992), Gurbaxani et al.'s 
institutional model (1990), and Lyytinen's 
transaction cost based model (1991). The 
analysis reveals that each model focuses on 
different aspects of the IT penetration process. 

REF: 29 

(Dean 1974) Dean, Robert, C. Jr. “The 
Temporal Mismatch -Innovation's Pace vs 
Management's Time Horizon, “ Research 
Management : 12-15, May, 1974. 

Category: innovation 


Key words: technology management, 
research planning 

Abstract/Summary: The author 

discusses the negative impact on technology 
development of management's focus on short 
term gains. He provides comparisons between 
American view and those of our competitors. 

Referenced by (Przbylinski 1988) 

(Dean 1987) Dean, James W., 
Jr.,’’Building the Future: The Justification 
Process for New Technology,” in Johannes M. 
Pennings and Arend Buitendam (editors), New 
Technology as Organizational Innovation: The 
Development and Diffusion of Microelectronics, 
chapter 3, pages 35-58. Ballinger Publishing 
Company, Cambridge, MA, 1987. 

Category: transition evaluation 

Key Words: technology evaluation, 
capital budgeting, technology justification 

Abstract/Summary: This article 

summarizes a recent study by the author that 
looked at “innovation conceptualized as a 
decision making process”, a concept proposed 
by Rogers and others. The sites for the study 
were five manufacturing organizations 
considering the adoption of advanced 
manufacturing technologies, such as computer- 
aided design or manufacturing requirements 
planning. Data came from both semi-structured 
interviews and archival materials, e.g., .internal 
memos, letters to and from vendors, etc. Dean 
used Downs and Mohr's "decision to innovate" 
as the unit of analysis. The study focuses on 
three components of the decision process: 
strategic/financial, social, and political. Each 
component is discussed in turn, with examples 
provided from the literature and the study itself. 
Each section includes strategies and tactics 
employed by individuals at different levels in the 
decision making process. 

Referenced by (Przbylinski 1988) 

(Downs 1976) Downs, George W. 
and Lawrence B. Mohr, “Conceptual Issues in 
the Study of Innovation.,” Administrative 
Science Quarterly 21:700-714, December, 
1976. 

Category: innovation 

Key Words: innovation theory, 
innovation research 

Abstract/Summary: Downs and Mohr 
discuss four sources of instability in existing 
empirical research variation among primary 
attributes, interaction, ecological inferences and 
varying operationalizations of innovation. Based 
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on this, they recommend seven characteristics 
that new research should have to avoid these 
problems 

Referenced by (Przbylinski 1988) 

(Downs 1979) Downs, George W. and 
Lawrence B. Mohr. Toward a Theory of 
Innovation. Administration & Society 
10(4):379-408, February, 1979. 

Category: innovation 

Key Words: innovation research, inno¬ 
vation theory 

Abstract/Summary: This paper con¬ 
tinues their work from 1976, defining new 
terminology’ for diffusion and adoption of 
innovations that is a first step in modeling the 
process. 

Referenced by (Przbylinski 1988) 

(Dutton 1981) Dutton, William H., The 
“Rejection of an Innovation: The Political 
Environment of a Computer-Based Model,” 
Systems, Objectives, Solutions 1(4): 179-202, 
1981. 

Category: innovation 

Key Words: case study, innovation 

adoption 

Abstract/Summary: Dutton’s paper 
provides a very’ detailed case study of the 
rejection of city planning model. It includes an 
in-depth analysis context, process and product 
characteristics. 

Referenced by (Przbylinski 1988) 

(Elder 1986) Elder, Victoria., Ada: A 
Case Study of Technology Transfer at DARPA., 
1986 

Category: technology transfer 

Key Words: case study, DARPA 

Abstract/Summary: This paper was 
produced by the Center for the Productive Use of 
Technology at George Mason University. It 
continues the work started by Havelock and 
looks mainly at the context for Ada adoption in 
the Defense Advanced Research Projects 
Agency. 

Referenced by (Przbylinski 1988) 

(Emerson 1983) Emerson, Thomas J., 
A. Frank Ackerman, Amy S. Ackerman, Priscilla 
Fowler,R. G. Ebenau and R. A. Rosenthal. 
“Training for Software Engineering Technology 
Transfer.” in IEEE Computer Society Workshop 
on Software Engineering Technology Transfer, 
pages 34-41. IEEE, Silver Spring , MD. April, 
1983. 


Category: technology transfer 

Key Words: consultative training, 
technical marketing 

Abstract/Summary: This paper 

describes the work of the Software Engineering 
Technology Transfer group at AT +T Bell 
Laboratories, This group combined good 
technical marketing practices with highly 
tailored training in a process called consultative 
training that was very successful at transferring 
technologies into development groups at Bell 
Labs. 

Referenced by (Przbylinski 1988) 

(Ettlie 1982) Ettlie, John E. and 
William P. Bridges “Environmental Uncertainty 
and Organizational Technology Policy,” IEEE 
Transactions on Engineering Management EM- 
29 (1):2-10, February, 1982. 

Category: innovation 

Key Words: technology management 

Abstract/Summary : Ettlie and 

Bridges look at the impacts of an uncertain 
business environment on the adoption of 
process innovations . 

Referenced by (Przbylinski 1988) 

(Ettlie 1987) Ettlie, John E. and 
William P. Bridges, “Technology Policy and 
Innovation in Organizations,” in Johannes M. 
Pennings and Arend Buitendam (editors), New 
Technology as Organizational Innovation: The 
Development and Diffusion of Microelectronics, 
Chapter 6, pages 117-137. Ballinger Publishing 
Company, Cambridge, MA, 1987. 

Category: innovation 

Key Words: technology policy, 
technology strategy 

Abstract/Summary: This work 

continues the recent trend toward the view that 
organizational innovativeness and success are a 
function of technical strategy. The authors 
contend that innovation is more likely in firms 
with an aggressive, forward-looking technology 
policy which they define as a "long range 
strategy of the organization concerning the 
adoption of new process and material 
innovations and the origination of new product 
or service innovations." They employ two self- 
reporting research methods: self-administered 
.questionnaires and open-ended interviews. 
Their search revealed four key aspects of an 
aggressive firm's technology policy; ( \)long- 
range commitment and investment in 
technological solutions to problems;(planning 
for the human resources needed to implement 
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strategic technological plans, (3)openness to the 
environment with an eye toward tracking and 
forecasting technological trends ;and (4) 
structural adaptations such as unique positions, 
teams; task-forces, and mechanisms for 
functioned integration in implement technology 
policies. One particularly interesting finding is 
that" although there are some industry 
differences, the greater the influence the 
government has as a factor in the firm's 
environment the less aggressive the firm's 
technology policy will be.:" 

Referenced by (Przbylinski 1988) 

(Farmer 1983) Farmer, J. Doyne, “The 
Dimension of Chaotic Attractors,” Physica 7D, 
pp. 153-180, North-FIolland Publishing Co., 
1983. 

REF: 46 

(Feldman 1986) Feldman, Martha S., 
“Constraints on Communication and Electronic 
Mail.” in Proceedings CSCW' 86, Pages 73-90. 
MCC Software Technology Program, Austin, 
TX, December, 1986. 

Category: communication 

Key Words: electronic mail, 

communication networks, weak ties 

Abstract/Summary: Feldman 

discusses how electronic media can create 
communication links between individuals who 
would otherwise not share information.. 
Granovetter's work on weak ties suggests that 
these new interactions may greatly influence 
behavior in the organization in question. 

Referenced by (Przbylinski 1988) 

(Fichman 1993) Fichman, Robert G. 
and Kemerer, Chris F., “Adoption of Software 
Engineering Process Innovations: The Case of 
Object Orientation”, Sloan Management Review, 
Winter (1993) 7-22 

Category: Technology Introduction 

Key Words: process innovations, object 
orientation, 4GL, RDB, Diffusion of 
Technology, Economics of Technology 
Standards, relative advantage, compatibility, 
complexity, trialability, observability, prior 
technology drag, irreversibility of investments, 
sponsorship, expectations 

Abstract/Summary: 

Software Development, unlike hardware 
development, seems to be plagued with constant 
problems. This stems from the fact that Software 
Engineering is still relatively new and 
undeveloped. This paper analyzes the adoption 


of three technologies: “structured 

methodologies, " fourth-generation programming 
languages (4GLs), and relational databases 
management systems (RDBs). The analysis of 
these technologies is from two perspectives: from 
the Diffusion of Technology (DOT) perspective, 
and from the Economics of Technology’ 
Standards perspective. 

This paper then goes on to discuss the 
adoption of Object-Orientated (00) Software 
Engineering Process Technologies. First, it 
gives an overview giving an overview of the 
concepts of 00. Then, based on the analysis of 
the older technologies, the authors predict that 
00 technology will not be quickly adopted 
outside of academia. 

REF: 22 

(Fichman 1994) Fichman-RG; 
Kemerera-CF, “Toward A Theory Of The 
Adoption And Diffusion Of Software Process 
Innovations,” in Diffusion, Transfer and 
Implementation of Information Technology, in 
Levine, Linda, ed., proceedings of the IFIP TC8 
Working Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.23-30 

ABSTRACT: It has become 

increasingly clear that no single, strongly 
predictive theory of innovation adoption and 
diffusion is likely to emerge. One response to this 
problem is to work at a higher level of 
abstraction and to identify general classes of 
explanatory factors or characteristic patterns 
related to adoption and diffusion of broadly 
defined innovations in broadly defined contexts. 
Another response is to narrow the focus to more 
specific innovations and contexts, and to develop 
a more strongly predictive theory centered 
around the distinctive characteristics of those 
innovations and contexts. This paper, takes the 
latter approach, and, in particular, argues that 
software process innovations (SPIs) (defined as 
a change to an organization's process for 
producing software applications) are 
distinguished by two characteristics: strongly 
increasing returns to adoption and substantial 
knowledge barriers impeding adoption. The 
combination of these two factors suggests that 
the study of the adoption and diffusion of SPIs 
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across the internal IS units of large 
organizations will require new explanatory 
variables and knowledge of new patterns of 
diffusion. 

REF: 22 

(Fowler 1993) Fowler, Priscilla and 
Levine, Linda, “Conceptual Framework for 
Software Technology Transition,” Technical 
Report, CMU/SEI-93 -TR-31, ESC-TR93-317, 
December 1993. 

ABSTRACT: A conceptual framework 
that integrates and describes the intersections of 
three life cycles of software technology 
transition: research and development, new 
product development, and adoption and 
implementation in organizations. We then apply 
the framework to the technology transition 
experiences of the Software Engineering 
Institute. 

REF: 38 

(Fowler 1994) Fowler P. and L. Levine, 
“From theory to practice: Technology Transition 
at the SE1”, 1060-3425/94 1994 IEEE, p 483-497 

Category: Technology Diffusion 

Key Words: Diffusion models, 
diffusion process, technology management, 
mobile phones 

Abstract/Summary: There are 3 life 
cycles of technology transition: research and 
development, new product development, and 
implementation. This paper discusses the need 
for common terms in comparing development of 
products and the life cycles in depth. 

REF: 32 

(Freeman 1988) Freeman, Peter, 
“Transfer bridge for software technology”, 
TH0218-8/88 IEEE p 8-12 

Category: Software Technology 

Key Words: transfer process, 

application, software technology, transfer bridge 

Abstract/Summary: This paper talks 
about a proposed idea to span the gap between 
production and application of software 
technology’. The first model of technology 
transfer consisted of three major functions: 
creation, transfer, and application of software. 
The transfer bridge will have realistic 
educational settings for professionals in post¬ 
graduate training. The three major 
implementation concerns are cooperation, 
stability, and complimentarity. Technology 
transfer problems must be attacked with more 


than one solution. The “transfer bridge” 
concept is just one of many ideas. 

REF: 3 

(Froehling 1981) Froehling, Flarold, 
Crutchfield, J.P., Farmer, Doyne, Packard, N.FI. 
and Shaw, Rob, "“n Determining the Dimension 
of Chaotic Flows," Physica 3D, pp. 605-617, 
North Holland Publishing Co., 1981. 

REF: 31 

(Gerhart 1994) Gerhart, S..L, “The 
MCC Formal Methods Transition Study: 
Technology Transfer For Complex Information 
Technology And Processes,” in Levine, Linda, 
ed., proceedings of the IF1P TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.249-55 

ABSTRACT: This paper describes a 
technology transfer model used at MCC, a 
research consortium in Austin Texas, in 1990- 
1991. It discusses the purpose of the project. It 
looks at the nature of the model. It gives some 
details of the project. It discusses experience 
before, during, and after the project and makes 
some generalizations. The interesting features of 
the project from a technology’ transfer 
perspective are: a successfully executed project, 
but with inconclusive results due to its untimely 
demise; a combination of scholarly investigation, 
ambitious experimentation, and practical, user - 
oriented delivery; a very’ broad, large scale 
exploration of a complex subject area, driven by 
templates and assessment criteria; and an 
example of what can be produced and a process 
that works in a short (one year) time frame. 

REF: 8 

(Gerstenfeld 1983) Gerstenfeld, Arthur 
and Paul De. Berger,.’’From Basic Research to 
Application, A Model of Effective Technology 
Transfer.” Technical Report TR-ONR-2, Office 
of Naval Research, August 1983. 

Category: innovation 

Key Words: technology transfer, 
technology management 

Abstract/Summary: This study traced 
sixty projects with links bebA’een research and 
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application through inter views with over 100 
individual engineers and scientists. The authors 
propose a linear model .of technology transfer 
which includes organizational ,environmental, 
people and resources issues, each a composite of 
a number of factors .Included are many 
illustrations of successful from transfer 
mechanisms used by the firms interviewed. Their 
research showed that the most important factors 
were management attitude, entrepreueurship, 
timing and dollars. The report concludes with a 
number of research questions, including the 
authors proposal that firms use "research 
portfolios", with the normal financial risks 
factors replaced by "probabilities of application " 
and "estimated time-of application. 

Referenced by (Przbylinski 1988) 

(Ginn 1994) Ginn, M.L., “The 
Transitionist As Expert Consultant: A Case 
Study Of The Installation Of A Real-Time 
Scheduling System In An Aerospace Factory” in 
Levine, Linda, ed., Proceedings of the IF1P TC8 
Working Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.179-98 

ABSTRACT: A small group of 

transitionists implemented a computerized 
scheduling system in an aerospace factory’. The 
author, one of these transitionists, uses a 
qualitative analysis to identify the dynamics that 
prevented full and rapid technology transition. 
This paper describes this project's diffusion 
process and compares and contrasts three 
cultures of inquiry (empirical-analytical, 
ethnography, and action research) appropriate 
to diffusion of innovation research. A new Four 
Hills model is introduced that can help assess 
risk and plan action steps in regard to four key 
roles: sponsors, transitionists, middle managers, 
and workers. The Four Hills model also can 
extend classic diffusion of innovation research, 
attributes of innovations. There is an important 
distinction beU\’een the implementation of the 
new system and the new work method, which is 
important in assessing implementation success. 
Finally, a metaphor may enhance understanding 
of a key implementation issue, middle managers 
acting as guardians of the social-work system 
which a new work method might disrupt. 


REF: 27 

(Glass 1998) Glass, Robert L., “An 
Assessment of Systems and Software 
Engineering Scholars and Institutions (1193- 
1997 ” jy ?e Journal of Systems and Software 43, 
pp. 59-64, 1998. 

REF: 8 

(Glasson 1994) Glasson,B.C., 

ISTRAD: “Toward A National Information 

Systems And Technology Research And 
Development Association,” in Levine, Linda, 
ed., proceedings of the IF1P TC 8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.333-44 

ABSTRACT: The International 

Federation for Information Processors (IFIP) 
aims to foster research, development, 
application, education and information 
dissemination in all fields of informatics. IFIP 
works through a number of technical committees 
each focussing on one aspect of in formatics. The 
technical committees in turn are responsible for 
a small number of working groups. Each 
working group focuses on some sub-set of the 
field covered by its parent technical committee. 
Given its objectives, IFIP is uniquely placed to 
foster both hard and soft technology diffusion, it 
is doing so with mixed success. The author 
describes how the competitive advantage offered 
by one IFIP technical committee has been used 
to improve technology diffusion in the field of 
information systems nationally and 
internationally. He describes a series of 
activities aimed at building up an information 
systems and technology’ research community in 
Australia. An initial, key activity, was to run an 
IFIP supported national seminar series on the 
"State of the Art in Information Systems". The 
seminar series was used to launch the concept of 
a national information systems and technology 
research and development association 
(ISTRAD). An additional outcome was a "State 
of the Art in Information Systems" video which is 
being distributed world-wide as an educational 
resource. 

REF: 12 
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(Granovetter 1973) Granovetter, Mark 
S. ‘‘The Strength of Weak Ties,” American 
Journal of Sociology 78(6), pp. 1360-1380, 
1973. 

Category: communication 

Key Words: communication networks, 
influence networks 

Abstract/Summary: This paper 

discusses weak ties, a concept that de-scribes 
how individuals who are weakly linked in social 
terms can exert substantial influence in 
communication networks. Weak ties have 
implications for technology dissemination 
activities. 

Referenced by (Przbylinski 1988) 

(Gross 1984) Gross, Pamela H.B. and 
Michael J. Ginzberg “Barriers to the Adoption of 
Application Packages,” Systems, Objectives, 
Solutions 4: pp. 211-226, 1984. 

Category: innovation 

Key Words: innovation adoption 

Abstract/Summary: This paper 

describes a qualitative study of technology 
adoption. It includes lengthy lists of factors to 
consider during technology insertion. 

Referenced by (Przbylinski 1988) 

(Grossman 1974) Grossman, Lee, The 
Change Agent, AMACOM, New York, 1974. 

Category: organization change 

Key Words: change agent 

Abstract/Summary: This book now 

out of print, takes an anecdotal approach to 
describing the roles and responsibilities of 
change agents in organizations. 

Referenced by (Przbylinski 1988) 

(Gruber 1969a) Gruber, William H., 
and Donald G. Marquis, Factors in the Transfer 
of Technology, Massachusetts Institute of 
Technology, Cambridge, MA, 1969. 

Category: technology transfer 

Key Words: technology transfer, 

innovation 

Abstract/Summary: This book is 
basically a workshop proceedings from a large 
workshop lead at MIT attended by some of the 
leaders in the field. Many papers in the book are 
referenced separately in this list. The summary 
paper written by Gruber and Marquis is 
outstanding.. 

Referenced by (Przbylinski 1988) 

(Gruber 1969b) Gruber, William H. 
and Donald G. Marquis, “Research on the 


Human Factor in the Transfer of Technology,” in 
Factors in the Transfer of Technology, Pages 
255-282. Massachusetts Institute of Technology, 
Cambridge, MA, 1969. 

Category: technology transfer 

Key Words: innovation 

Abstract/Summary: This summary 

paper contains sections on the following 
determinants of technology transfer draining 
and experience; individual personality 
characteristics; communication patterns; 
organizational effects; mission orientation; and 
motivation. 

Referenced by (Przbylinski 1988) 

(Havelock 1985) Havelock, Ronald G. 
and David S. Bushnell. ‘Technology Transfer at 
DARPA -The Defense Advanced Research 
Projects Agency: A Diagnostic Analysis.” 
Technical Report DTIC AD-A164 
457,Technology Transfer Study Center, George 
Mason University, December, 1985. 

Category: technology 

transfer 

Key Words: technology transfer, 
DARPA 

Abstract/Summary: This paper 

provides an in-depth look at how technology 
transfer is planned as a multi-stage process at 

DARPA. It discusses the problems inherent in 
trying to get government, defense, academic and 
contractors to cooperate to a common end. The 
case study by Elder is a continuation of this 
work. 

Referenced by (Przbylinski 1988) 

(Heidtman 1994) Heidtman, S.E., 
“Exploration Of An Incremental Approach To 
Technology Transfer And The Issues Affecting 
Its Implementation,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 

Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 

Amsterdam, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.347-51 

ABSTRACT: New system 

development technologies promise higher 
quality, less costly systems. However, 
implementing new technologies is a difficult, 
often unsuccessful task. Some of the problems 
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associated with transition may be resolved 
through an incremental technology transition 
process. However, the benefits of using this 
model are, at this time, unproven. In order to 
justify the effort required to implement the 
model, its benefits should be clearly established. 
The author offers a summary of the incremental 
model and discusses barriers to validation and 
implementation. 

REF: 0 

(Hook 1986) Hook, Audrey A., Terry 
Mayfield, Thomas Frazier, Alan K. Graham and 
David Kreutzer, “Cost Effectiveness Tradeoffs in 
Computer Standardization and Technology 
Insertion,’’ Technical Report P-1931, Institute for 
Defense Analysis, June, 1986. 

Category: transition evaluation 

Key Words: decision support 

system,cost model 

Abstract/Summary: This report 

discusses the feasibility of developing a decision 
support system to aid in the use of software 
standards and in the development of strategies 
for technology insertion. In this study performed 
for the Ada Joint Program Office, IDA developed 
a prototype system which could simulate some 
effects of standardization policies on related 
technologies and Mission Critical Computer 
Resources costs. The preliminary> result obtained 
by their prototype 'is that standardization 
policies have a payoff two to three orders of 
magnitude greater than their costs. 

Referenced by (Przbylinski 1988) 

(Hornbach 1988) Hornbach, Katherine, 
"The Role of Support Staff in the Successful 
Introduction of New Tool Technology", 
TH0218-8/88 IEEE pp. 74-77, 1988. 

Category: Technology introduction. 
Technology management 

Key Words: Administrative support, 
Process support. Cultural issues 

Abstract/Summary: This paper 

discusses the responsibilities needed in order to 
use new tool technology successfully. It 
provides a detailed list of the tasks needed to be 
performed by a support person and includes a 
real-life example illustrating the role of a support 
person in tool introduction. The paper focuses on 
the role of a support person in both 
administrative and process support tasks. It 
describes the necessity of the support person 
when dealing with new tool technology. 

REF: 3 


(Huber 1991) Huber, George P. 
“Organizational Learning:: The Contributing 
Processes and the Literatures”, Organization 
Science, Vol 2, No. 1, February 1991. 

REF: 204 

(Humphrey 1987a) Humphrey, Watts 
S. and William Sweet. “A Method for Assessing 
the Software Capability of Contractors, ” 
Preliminary Report CMU/SEI-87 TR-23, 
Software Engineering Institute, July, 1987. 

Category: organizational change 

Key Words: process assessment, 
process consultation 

Abstract/Summary: This report 

contains a preliminary’ version of an assessment 
instrument jointly developed by the SEI and 
Mitre for the Air Force. It allows contractors to 
perform self-assessments of their software 
capabilities to pinpoint areas for possible 
improvement. If properly used, this tool can help 
determine technology for insertion. 

Referenced by (Przbylinski 1988) 

(Humphrey 1987b) Humphrey, Watts 
S. ‘Characterizing the Software Process: A 
Maturity Framework,” Technical Report DTIC 
ADA 1182895, Software Engineering Institute, 
June 1987 

Category: organizational change 

Key Words: process assessment, 
process consultation 

Abstract/Summary: This paper 

provides the foundation for the process 
improvement Work at the SEI. It describes a five 
stage framework for the maturity of an 
organization's software development activities 
based on Humphrey's work at IBM. 

Referenced by (Przbylinski 1988) 

(Huseth 1988) Huseth, Steve, “The 
Cost of Technology Transfer,” TH0218-8/88, 
IEEE, pp. 80-81, 1988. 

REF: 5 

(IEEE 83a) The Institute of Electrical 
and Electronics Engineers, Inc. IEEE Standard 
Glossary of Software Engineering Terminology. 
The Institute of Electrical and Electronics 
Engineers, Inc., New York, NY, 1983. 

Referenced by (Przbylinski 1988) 

(IEEE 83b) IEEE Computer Society, 
Konover Hotel, Miami Beach, Florida. IEEE 
Computer Society Workshop on Software 
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Engineering Technology Transfer, April 25-27, 
1983. 

Category: technology transfer 

Key Words: technology transfer 

Abstract/Summary: This proceedings 
describes the first workshop of this kind held to 
consider software issues. While many of the 
papers are good, the best outputs here are the 
panel summaries included in the front of the 
proceedings. Unfortunately, the panels 
recommendations were not .followed up by the 
following workshops. 

Referenced by (Przbylinski 1988) 

(IEEE Std 1348-1995) IEEE Std 1348- 
1995, IEEE Recommended Practice for the 
Adoption of Computer-Aided Software 
Engineering (CASE) Tools, ISBN 1-55937-591- 
4, IEEE, 1996. 

REF: 25 

(Ignace 1994) Ignace, S.J; Sedlmeyer, 
R.L.; Thuente, D.J., “Integrating Rate- 
Monotonic Analysis Into Real-Time Software 
Development,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.257-74 

ABSTRACT: Rate-monotonic analysis 
(RMA) is a new technology that provides an 
engineering basis for designing real-time 
systems. During the last two years we have made 
significant progress in integrating this 
technology with our standard software 
development process. We give an account of our 
activities in procuring expertise in, promoting 
the use of, and providing training for rate- 
monotonic analysis. We present our model for 
technology’ acquisition and discuss how our 
experiences relate to established models of 
technology transfer. We also detail two case 
studies which served as convincing examples of 
the utility of this technology. 

REF: 14 

(Isenson 1969) Isenson, Raymond S. 
Project Hindsight: “An Empirical Study of the 
Sources of Ideas Utilized in Operational Weapon 


Systems,” in William H.Gruber and Donald G. 
Marquis (editors), Factors in the Transfer of 
Technology, chapter 10, pages 155-176. The 
M.I.T. Press, Cambridge, MA, 1969. 

Category: innovation 
Key Words: technology development 
Abstract/Summary: There are implicit 
assumptions made by various government 
agencies that their research and development 
money is well spent, with results flowing into 
systems into production. This study considered 
just that question. The author concludes, among 
other things, that this may be true, although 
there may be a time lag of up to ten years 
Referenced by (Przbylinski 1988) 

(Jaakkola 1995) Jaakkola, Hannu, 
Comparison and Analysis of Diffusion Models”, 
p 65-82, 1995. 

Category: Technology Diffusion 
Key Words: Diffusion models, 
diffusion process, technology management, 
mobile phones 

Abstract/Summary: A real diffusion 
process is too complex to put into a model 
accurately. We try our best to model what we 
see but we cannot understand all of the 
interrelations between variables. There are 
several types of models, each with their own 
attributes. This paper focuses on which models 
best fit each situation. 

REF: 33 

(Jeffrey 1988) Jeffrey, H Joel, " A 
Unifying Comprehensive Framework for 
Software Technology Transfer", TH0218-8/88 
p82-85 

Category: Technology transfer 
Key Words: Conceptual tools, 

Linguistic tools, Methodological tools, 

Descriptive psychology, Pragmatic evaluation, 
Communities, Sociology, Action 
Abstract/Summary: 

This paper describes the usage of 
Descriptive Psychology to simplify the process of 
transferring technology to more communities. It 
explains the human actions needed to transfer 
technology successfully, describing the suitable 
formulation that is critical in explaining the key 
differences between descriptive psychology and 
other approaches. It also describes the 
formulation that allows successful 
communication. The paper provides a 
parametric analysis of human behavior and 
communities while listing the steps needed to 
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gain cooperation in a project. Lastly, the paper 
contains a pragmatic evaluation from the 
applications of these formulations through 
Putnam. 

REF: 9 

(Kamm 1986) Kamm, Judith B. “The 
Portfolio Approach to Divisional Innovation 
Strategy, “ Journal of Business Strategy 7 (1): 
pp. 25-36, Slimmer, 1986 

Category: innovation 

Key Words: management of 

innovation, strategic planning 

Abstract/Summary: Organizations of¬ 
ten must juggle the varied needs for product, 
process and administrative innovation. In 
addition, failure in one type of innovation can 
lead to failure in the others. The author proposes 
that management (on the divisional level) use a 
portfolio approach to balance these needs. 
Innovation projects are evaluated on two 
different scales: form and objectives. Form 
consists of product, process and administrative 
innovation, classified by whether the changes 
are "revolutionary" or "evolutionary". The 
objectives included are maintaining the business, 
expanding the business or using capacity, 
classified this time into short and long-term 
categories. The author goes on to discuss the 
problems that can arise using this approach, and 
gives examples from field work from 
semiconductor and pharmaceutical firms. 

Referenced by (Przbylinski 1988) 

(Kanter 1983) Kanter, Rosabeth Moss, 
“Change Masters And The Intricate Architecture 
Of Corporate Culture Change,” Sloan 
Management Review, pp. 18-28, October, 1983. 

Category: organizational change 

Key Words: change agents, innovation, 

innovation roles 

Abstract/Summary: This paper is an 
excerpt from her book The Change Masters - 
case studies of change in high tech 
organizations, 

Referenced by (Przbylinski 1988) 

(Kappelman 1994) Kappelman-LA; 
McLean-ER, “ User engagement in information 
system development, implementation, and use: 
toward conceptual clarity,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 


Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.199-214 

ABSTRACT: Although a great deal of 
research attention has been given to the roles of 
users in information system development and 
implementation, there is a scarcity of common 
models and measurements. Moreover, the 
empirical evidence regarding the value of such 
user roles is mixed. As a consequence, it is 
difficult to make comparisons and 
generalizations based upon this literature. This 
state of affairs is the result of the varied 
conceptualizations and operationalizations of the 
constructs utilized, the somewhat ambiguous use 
of terminology’, and other methodological 
deficiencies. This paper presents a more 
consistent vocabulary to be used with regard to 
the various ways in which users can be engaged 
in the processes of information system 
development, implementation, and use. Drawing 
upon recent information systems studies, as well 
as the psychological, consumer, and 
organizational behavior literature, a taxonomy 
for the engagement of users with information 
systems is proposed. This framework recognizes 
distinctions among the psychological and 
behavioral components, as well as the task and 
product objects of such engagements. 
Preliminary’ evidence suggests that such 
distinctions can improve the research that is 
currently being undertaken in this important 
area. 

REF: 98 

(Kautz 1994) Kautz K. ; McMaster T., 
“The failure to introduce system development 
methods: a factor-based analysis,” in Levine, 
Linda, ed., proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.275-87 

ABSTRACT: Structured methods for 
the development of computer-based systems have 
been promoted for more than 20 years, but still 
they are not used in many organisations. We 
investigate the issue of failed attempts to 
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implement structured methods. On the basis of a 
literature study we present a framework for 
analyzing failure and introduce a case study 
showing how such a failure occurred in a 
practical situation. Through critical examination 
of a number of factors we formulate some 
recommendations. These are neither 
generalizable nor offer a guaranteed 
prescription for success, but we feel that they 
have some value in that they may help to 
minimise the risk of failure for the future 
introduction of structured development methods. 

REF: 24 

(Klempa 1994) Klempa-MJ, 

“Management of information technology 
diffusion: a meta-force integrative contingency 
diffusion model,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.31-52 

ABSTRACT: Prior research analyzes 
diffusion of information technology (IT) from 
disparate theoretical frameworks, often cross 
sectioned in nature, and not utilizing 
interactionist perspectives. This paper proposes 
an original, holistic, U\’o-tiered contingency IT 
diffusion model. The first tier identifies three 
meta-forces which drive information technology 
acquisition and diffusion (IT/AD)-organization 
culture, organization learning and knowledge 
sharing. Both the characteristics of, as well as 
the interaction of, these three meta-forces 
determines the organization's creativity, synergy’, 
and leveraging of IT/AD. These three meta¬ 
forces interact recursively, expressed via both 
rational and political organization processes. 
Unlike previous nominal IT/AD diffusion models, 
the IT/AD contingency model proposed herein is 
parsimonious. The second tier of the IT/AD 
model delineates secondary’ IT/AD forces 
(moderating variables) which further enhance or 
inhibit IT/AD. The second tier also considers the 
decision-making/diffusion process coupling. The 
complete IT/AD contingency model hypothesizes 
clusters of S-shaped diffusion curves. Future 
research directions, utilizing positivist, 
interpretive, and combined positivist/interpretive 


venues, as suggested by the model, are 
presented. 

REF: 113 

(Kuvaja 1994) Kuvaja, P. “Productivity 
of CASE technology implementation in SW 
development and maintenance on the third 
maturity level,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.215-29. 

ABSTRACT: This paper reports the 
effects of CASE technology implementation on 
the productivity of software processes at the 
third (defined) maturity level. The results were 
gathered in a life-cycle simulation in which 11 
lower and upper CASE technologies were used 
to develop and maintain the same test software 
system. Productivity was measured in labour 
hours spent on one function point. The results 
show differences and similarities in productivity 
between three classes of CASE technology and 
between development and maintenance. 

REF: 47 

(Leon 1994) Leon-G; Carracedo-J; 
Yelmo-JC; Sanchez-C; Moreno-JC; Gil-JJ; 
Carrasco-J., “An industrial experience of using 
an incremental model of technology transfer of 
formal development methods,” in Levine, Linda, 
ed., proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.289-308 

ABSTRACT: This paper describes the 
process of transferring formal methods to the 
industry’ and specifically LOTOS and SDL as 
representative Formal Description Techniques 
(FDTs). From this purpose, a technology 
transfer model is described in order to 
accelerate their use. This model is conceptually 
presented under an incremental approach where 
the transference is done in several phases (or 
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cycles). The first cycle is termed academic; 
there, the formalism and its theoretical 
framework is introduced. The second one is the 
methodological cycle where the emphasis is 
placed on the design of large specifications and 
its evaluation in a specific application domain to 
derive a sound methodological basis. The 
industrialization cycle considers the problems of 
introduction of the selected technology in the 
industrial practice under specific constraints. 
The experience of using this model in one 
research project (MEDAS) is outlined. The 
project included the development of three large 
case studies in the telecom field. From this 
experience a set of recommendations about how 
to transfer FDTs based on the characterization 
of industries w.r.t. software technology’ factors is 
proposed. 

REF: 16 

(Leonard-Barton 1985a) Leonard- 

Barton, Dorothy “Experts as Negative Opinion 
Leaders in the Diffusion of a Technological 
Innovation,” Journal of Consumer Research 110: 
pp. 914-926, March, 1985. 

Category: innovation 

Key Words: diffusion research, 

transition barriers 

Abstract/Summary: Much diffusion of 
innovation research suffers from a "pro¬ 
innovation" bias, that is, studies look at the 
positive aspects and forces in the spread of an 
innovation. In this case, the author is interested 
in "negative" opinion leaders, individuals with 
stature in a given field that oppose adoption of a 
given innovation. Leonard-Barton conducted a 
study of the diffusion of the use of non-precious 
alloys by prothodontists (dentists who specialize 
in crowns and bridges as restorations). While 
most researchers take a sociometric approach 
aimed at discovering direct verbal 
communication patterns within a closed 
community, this study used a lengthy 
questionnaire administered to two populations: 
a sample from the greater Boston area and a 
national sample obtained from professional 
societies. While many of her hypotheses were 
rejected there were some interesting results 
Positive opinion leaders must propagate new 
skills in addition to providing information. 
Negative opinion leaders need only denigrate the 
innovation. Leonard Barton postulates that this 
is true any time the innovation requires 
acquisition of complex skills in addition to those 
required for the alternative product or method. 
Equally important is the finding that opinions 


formed on the basis of in formation alone are just 
as negative as those based on personal 
experience with the innovation. 

Referenced by (Przbylinski 1988) 

(Leonard-Barton 1985b) Leonard- 
Barton, Dorothy and William A. Kraus, 
“Implementing New T cchnology.”.//«rv««/ 
Business Review, pp. 102-110, November- 
December, 1985. 

Category: organization change 

Key Words: innovation, innovation 
roles, risk reduction 

Abstract/Summary: Leonard-Barton 
discusses roles in the innovation process, the use 
of pilot projects and other general risk reduction 
strategies . 

Referenced by (Przbylinski 1988) 

(Lien 1994) Lien-L, “Transferring 
technologies from developed to developing 
industrial and commercial environments,” in 
Levine, Linda, ed., proceedings of the IFIP TC8 
Working Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.87-98 

ABSTRACT: The author presents the 
current practice of training and operations to 
increase the probability of successful technology 
and information transfer. He addresses the 
process, content, management and factors that 
a ffect transfer. He discusses the in fluence of the 
project dynamic, capacity of suppliers of 
technology to transfer, and receivers to accept 
and apply. The author offers a management 
framework that allows for effective definition, 
control and verification that technology has been 
transferred. He presents a mathematical model 
that addresses eight factors influencing transfer, 
that can be effectively used to predict the 
probability of successful transfer, and as a 
method to develop alternative transfer scenarios. 

REF: 6 

(Lindgaard 1994) Lindgaard-G, “Some 
important factors for successful technology 
transfer”, in Levine, Linda, ed., proceedings of 
the IFIP TC8 Working Conference on Diffusion, 
Transfer and Implementation of Information 
Technology, Software Engineering Institute, 
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Carnegie Mellon Institute, Pittsburgh, PA, North 
Holland, Amsterdam,, London, New York, 
Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.53-66 

ABSTRACT: This paper discusses 

some important factors which, to a great extent, 
determine the success of technology-based 
products, services and features in the market 
place. In particular, it addresses the issues of 
usefulness, usability and implementation 
strategies employed by organizations undergoing 
technological changes. It is shown that 
usefulness, or the degree to which products 
match users' needs, can determine the success or 
failure of certain products, and that, in many 
cases, the number of smart features available to 
users by far outweigh those that are actually 
being used. Three studies are discussed to 
support this point. It shows further that usability 
is quantifiable and measurable, and that product 
development should be guided by usability goals 
and criteria. Usability can and should be 
evaluated throughout the development process in 
an iterative fashion to avoid usability disasters 
at the last minute before a product is released. 
Successful transfer of technology, it is argued, is 
related to careful strategic planning and 
involvement of people whose jobs will be 
affected by the introduction of new technology’. 

REF: 36 

(Lopata 1994) Lopata-CL, 

“Implementation scripts: a new approach to 
modeling the process,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.231-43 

ABSTRACT: This paper presents a 
new, empirically grounded, process model of the 
implementation of information technology in an 
organization. The model is based on a 
longitudinal investigation of the implementation 
of a computer -based information management 
system in a three-college library> consortium. 
Data were collected through interviews with, 
and observations of, participants in the 
implementation process at various stages in that 


process, and through an analysis of pertinent 
documents produced by the organization. 
Implementation is conceptualized here as a 
process of mutual adaptation: both the 
technology and the organization, where that 
technology was implemented, were adapted as 
the process unfolded. Using events analysis, in 
combination with script theory, instances of 
adaptation are presented in the context of other 
organizational events and interruptions to the 
process. Patterns of events are then identified 
and aggregated to form scripts for the different 
periods of the implementation process. 

REF: 7 

(Maidique 1980) Maidique, Modesto 
A, “Entrepreneurs, Champions, and 
Technological Innovation,” Sloan Management 
Review’ 21(2), pp.5976, Winter, 1980. 

Category: innovation 

Key Words: innovation roles, risk 
reduction 

Abstract/Summary: Maidique 

summarizes much of the existing work on roles in 
the innovation process. 

Referenced by( Przbylinski 1988) 

(Mankin 1984) Mankin, Don, Tora K. 
Bikson and Barbara Gutek. “Factors in 
Successful Implementation of Computer-Based 
Office Information Systems: A Review of the 
Literature with Suggestions for OBM Research,” 
Journal of Organization Behavior Management 
(6/3/4): pp. 1-20, Fall/Winter, 1984. 

Category: innovation 

Key Words: technology transfer, 
innovation, innovation roles 

Abstract/Summary: This paper starts 
with a good, short review of the literature on 
innovation acceptance. It continues to develop a 
communication motivation oriented model of 
organizational interaction. 

Referenced by( Przbylinski 1988) 

(Mitchell ) Mitchell, K. I„ 

“Technology Transfer to & from the Industrial 
Sector,” 495-496. 

Category: Technology Transfer 

Key Words: communication, 

cooperation 

Abstract/Summary: This paper 

provides an idea of what it takes to transfer 
technology to and from the Industrial Sector. 
Some of the ideas discussed include 
communication, strategic alliances, the forms of 
strategic alliance, the three stage model, 
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funding, personnel exchange, artificial barriers, 
transfusion through graduates, and diffusion 
models. 

REF: 


(Montealegre 1994) Montealegre-R; 
Applegate-LM, “Information technology and 
organization change: lessons from a less - 
developed country,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Flolland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.99-131 

ABSTRACT: The introduction and 

assimilation of technology within organizations 
has been viewed as a process of organizational 
change that involves the mutual adaptation of 
environment, organization, individual/work 
group, and information technology. The authors 
present a conceptual framework for studying this 
complex phenomena and illustrate the use of the 
framework by analyzing the introduction of 
information technology within a Guatemalan 
sugar company. 

REF: 39 

(Myers 1985) Myers, Ware MCC: 
“Planning the Revolution in Software,” IEEE 
Software 2(6), pp. 68-73, November, 1985. 

Category: technology transfer 

Key Words: research consortia 

Abstract/Summary: This interview 
with Les Belady provides insight into MCCs 
approach to technology transfer 1985). 

Referenced by (Przbylinski 1988) 


(Paulish 1994) Paulish D.J., 
“Experience with software measurement 
technology transfer,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.381-94 


ABSTRACT: The author describes the 
experience of Siemens AG in transferring 
technology associated with the application of 
software measurement to software development 
organizations. This experience was obtained as 
part o f the Consortium working on the ESPRIT II 
PYRAMID Project. The author describes some of 
the methods used for technology transfer. He 
summarizes the lessons learned about 
technology transfer as a result of the project. An 
approach is given to measure technology 
transfer exposure, and the Siemens results for 
the PYRAMID Project are given. The benefits to 
Siemens resulting from exploitation of 
PYRAMID Project results are summarized. 

REF: 8 

(Pennings 1987) Netherlands Institute 
for Advanced Studies, Groningen, The 
Netherlands. New Technology as Organizational 
Innovation: The Development and Diffusion of 
Microelectronics, 1987. 

Category: innovation 

Key Words: innovation, organizational 
change, technology selection, technology 
justification. 

Abstract/Summary: This book, 

published in 1987, is a collection of related 
articles on innovation, primarily in high-tech 
industries. Its thirteen chapters each discuss a 
different topic, including technology 
justification, technology policy, high technology 
marketing, and the impacts of information 
technology. 

Referenced by (Przbylinski 1988) 

(Pfleeger 1999) Pfleeger, S. L., 
“Understanding and improving technology 
transfer in software engineering”, The Journal of 
Systems and Software 47 (1999) 111-124 

Category: Technology Introduction 

Key Words: technology transfer 

Abstract: Even at its cpiickest, it usually 
takes decades for a new technology to be widely 
adopted as standard practice in government and 
industry>. This paper, though it never explicitly 
defines “technology transfer, ” describes the 
processes in which technology’ is transferred 
from idea (“Technology Creation ”) to adoption 
(“Technology Diffusion ”). It describes the 
processes and roles involved. This paper also 
describes ways in which the speed of technology’ 
transfer can be increased. 

REF: 15 
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(Popham 1975) Popham, W. James. 
Educational Evaluation, Prentice-Hall, Inc., 
Englewood Cliffs, NJ, 1975. 

Category: transition evaluation 

Key Words: evaluation models 

Abstract/Summary: This book 

contains models and methods for educational 
evaluation that may be applicable to technology 
transfer. 

Referenced by (Przbylinski 1988) 

(Pries 1994) Pries-Heje-J; Lauesen-S; 
Schroder-B, “Barriers to software technology 
transfer in the Danish electronic equipment 
industry,” in Levine, Linda, ed., proceedings of 
the IF1P TC8 Working Conference on Diffusion, 
Transfer and Implementation of Information 
Technology, Software Engineering Institute, 
Carnegie Mellon Institute, Pittsburgh, PA, North 
Holland, Amsterdam,, London, New York, 
Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.133-7 

ABSTRACT : The authors summarize a 
study of software technology’ transfer from 
academia to the Danish electronic equipment 
industry’. The study revealed that only very few 
research results are transferred. The lack of 
technology’ transfer is caused by researchers' 
lack of knowledge of the real problems in 
industry’. The study also showed that even very 
good and very relevant results sometimes failed 
to be taken into regular use in the industry. 
Many different barriers cause this failure. 
Finally the authors suggest how these large 
technology transfer problems could be 
overcome. 

REF: 5 
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(Quinn 1979) Quinn, James Brian., 
“Technological Innovation, Entrepreneurship, 
and Strategy,” Sloan Management Review 20(3), 
pp. 19-30, 1979. 

Category: innovation 
Key Words: organizational evolution 
Abstract/Summary: The author talks 
about idea generation and product development 
during the different stages of an organizations 
lifecycle. It includes discussion of the conflicts 
between corporate needs and entrepreneurship 
Referenced by (Przbylinski 1988) 

(Quinn 1982) Quinn, James Brian and 
James A. Mueller, “Transferring Research 
Results to Operations,” in Michael L. Tushman 
and William L. Moore (editors), Readings in the 
Management of Innovations, pages 60-83. 
Ballinger Publishing Company, Cambridge, MA, 
1982. 

Category: technology transfer 
Key Words: receptor groups, 

technology management 

Abstract/Summary: In the authors 
opinion, certain management actions can 
stimulate the effective flow of technology within 
organizations. They describe a four-step 
program to achieve his end: examine resistances 
at critical technological points,, provide the 
information to target research toward company 
goals,. foster a positive motivational 
environment; and plan and control the 
exploitation ofR&D results. 

Referenced by (Przbylinski 1988) 

(Raghavan 1986) Raghavan, Sidhar A. 
and Donald R. Chand “Diffusion of Software 
Engineering Methods, ” Technical Report TR- 
86-10, Wang Institute of Graduate Studies, 
November, 1986. 

Category: innovation 
Key Words: innovation diffusion 
Abstract/Summary: This technical 
report provides a good summary Everett Rogers' 
framework for diffusion of innovations. 

Referenced by (Przbylinski 1988) 

(Raghavan 1988) Raghavan, Sridhar, 
“Diffusion Software Engineering Innovation,”, 
TH0218-8/88 IEEE, pp. 116-118, 1988. 

REF: 1 

(Raghavan 1989). Raghavan, Sridhar, 
and Chand, Donald R, “Diffusing Software- 
Engineering Methods”, 0740-7459/89 IEEE, pp. 
81-89 


Category: Diffusing Technology 

Key Words: transfer, practice, 
systematic understanding 

Abstract/ Summary: Software 

Engineers are having a difficult time trying to 
find a good framework to study the nature of 
so ftware-technology transfer. Although the field 
of software engineering has significantly grown, 
it has not changed the practice of software 
development. The problem of understanding 
software-technology transfer is the software¬ 
engineering innovators are tiying to oversimplify 
or run away from the problems concerning 
technology transfer. To solve the overall 
problem it is helpful to get a very through 
understanding of the processes and problems 
and tackle the technology transfer problems 
head-on. 

REF: 10 

(Ramiller 1994) Ramiller, N.C.; 
Swanson, E.B., “Toward an institutional view of 
information technology diffusion, transfer, and 
implementation,” in Levine, Linda, ed., 
proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.353-5 

ABSTRACT: We preview our effort, 
currently underway, to develop theory on the 
development of community images for new’ 
information technologies and on the role these 
images play in the adoption, diffusion, and 
implementation of those technologies. 

REF: 6 

(Redwine 1984) Redwine, Samuel T., 
et al, “DoD Related Software Technology 
Requirements, Practices, and Prospects for the 
Future. ” Technical Report IDA Paper P-1788, 
Institute for Defense Analysis, June, 1984. 

Category: technology transfer 

Key Words: technology maturation, 
case study 

Abstract/Summary: This study, funded 
by the STARS JPO, considers the maturation 
process for software technologies, including 
Unix and Smalltalk-80. While the study is not 
rigorous, it does provide some general 
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maturation characteristics and good case 
studies. 

Referenced by (Przbylinski 1988) 

(Rice 1982) Rice, Ronald E., Bonnie 
McD. Johnson, and Everett M. Rogers. 
Facilitation Adoption of New Office 
Technology. 1982 Office Automation Digest : 
645-652, April, 1982 " 

Category: innovation 
Key Words: innovation adoption 
Abstract/Summary: Building on 

Rogers' previous work, this paper discusses a 
five stage model of the innovation process: 
agenda-setting, matching, redefining, structuring 
and interconnecting: 

Referenced by (Przbylinski 1988) 

(Riddle 1984) Riddle, William E. “The 
Magic Number Eighteen Plus or Minus Three: A 
Study of Software Technology Maturation,” 
ACM SIGSOFT Software Engineering Note 9 
(2):pp. 21-37, April, 1984. 

Category: technology transfer 
Key Words: technology maturation 
Abstract/Summary: This paper was 
extracted from the Redwine study. 

Referenced by (Przbylinski 1988) 

(Roberts 1981) Roberts, Edward B. and 
Alan R. Fusfeld “Staffing the Innovative 
Technology-Based Organization.,” Sloan 
Management Review : 19-34, Spring, 1981. 
Category: innovation 
Key Words: innovation roles, transfer 
planning 

Abstract/Summary: In addition to 
discussing the roles in the innovation process, 
this paper includes a multi-stage view of a 
technical innovation project. The authors 
provide insights into possible implementations 
of each stage. 

Referenced by (Przbylinski 1988) 

(Robertson 1987) Robertson, Thomas S. 
and Flubert Gatignon, “The Diffusion of Fligh 
Technology Innovations: A Marketing 

Perspective,” in Johannes M. Pennings and 
Arend Buiten-dam (editors), New Technology as 
Organizational Innovation: The Development 
and Diffusion of Microelectronics, chapter 8, 
pages 179-196. Ballinger Publishing Company, 
Cambridge, MA, 1987. 

Category: innovation 
Key Words: diffusion research, 
technology marketing 


Abstract/Summary: In this article the 
authors attempt to combine results from diffusion 
research from the disciplines of marketing and 
organizational behavior “to derive an enriched 
model for the study of technology diffusion ”. 
They argue that traditional diffusion research 
ignores supply-side factors, such as the 
competitive and marketing actions of innovation 
suppliers. In addition, most existing results do 
not study contextual variables (e.g., industry 
competitiveness, return on investment, and 
industry structure) in great enough depth. The 
paper goes on to list supply-side and contextual 
factors affecting diffusion and contains a number 
of propositions for further study. 

Referenced by (Przbylinski 1988) 

(Rogers 1977) Rogers, Everett M., 
Linda Williams and Rhonda B. 
West, .Bibliography of the Diffusion of 
Innovation,. Bibliography Council of Planning 
Librarians Exchange Librarians Number 1420- 
1422, Institute for Communication Research, 
Stanford University, December, 1977. 

Category: innovation 

Key Words: diffusion of innovations, 

Diffusion bibliography 

Abstract/Summary: This bibliography 
documents the collection of the Diffusion 
Documents Center at Stanford University. At the 
time this was published the center contained 
approximately 2750 diffusion references. Two 
types of publications are contained: (1) 
empirical diffusion studies and (2) non-empirical 
publications, which include bibliographies, 
summaries of diffusion findings reported in other 
publications and theoretical writings. 

Referenced by (Przbylinski 1988) 

(Rogers 1981) Rogers, Everett M. and 
D. Lawrence Kincaid .Communication Networks: 
Toward a New Paradigm for Research, The Free 
Press, New York, 1981. 

Category: communication 

Key Words: communication network 

analysis 

Abstract/Summary: Rogers and 

Kincaid discuss their paradigm for 
communication network analysis. Their methods 
can help transfer organizations track the effects 
of their dissemination efforts. 

Referenced by (Przbylinski 1988) 

(Rogers 1983) Rogers, Everett M., 
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1983 
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Category: innovation 

Key Words: innovation diffusion 

Abstract/Summary: Rogers’ work is 
the basis upon which most existing diffusion of 
innovations work is built. It is a highly readable 
work that can provide insights into technology 
transfer planning. 

Referenced by (Przbylinski 1988) 

(Roland 1980) Roland, Ronald J., “An 
Interactive Decision Support System for 
Technology Transfer Pertaining to Organization 
and Management,” Technical Report AD- 
A089968, Naval Postgraduate School, July; 
1980. 

Category: technology transfer 

Key Words: decision support systems 

Abstract/Summary: This report more 
fully describes Roland's DSS for technology 
transfer of management practices. 

Referenced by (Przbylinski 1988) 

(Roland 1982) Roland, Ronald J. “A 
Decision Support System Model for Technology 
Transfer.,” Journal of Technology’ Transfer 
7(l):73-93, 1982. 

Category: technology transfer 

Key Words: transfer models, transfer 

aids 

Abstract/Summary: This paper, a 
short version of Roland's technical report from 
the Naval Postgraduate School, briefly describes 
an intelligent system that helps in the design of 
decision support systems. The prototype was 
built using the EMYCIN production rule system 
used at Stanford University. It embodies the 
linker concepts investigated by Creighton et al. 

Referenced by (Przbylinski 1988) 

(Saga 1994) Saga-VL; Zmud-RW, 
“The nature and determinants of IT acceptance, 
routinization, and infusion,” in Levine, Linda, 
ed., proceedings of the IFIP TC8 Working 
Conference on Diffusion, Transfer and 
Implementation of Information Technology, 
Software Engineering Institute, Carnegie Mellon 
Institute, Pittsburgh, PA, North Holland, 
Amsterdam,, London, New York, Tokyo, 1994. 

SOURCE: IFIP-Transactions-A- 

(Computer-Science-and-Technology). vol.A-45; 
1994; p.67-86 

ABSTRACT: Although it is well 

recognized that the post-implementation 
behaviors, e.g., the acceptance, routinization, 
and infusion of information technology (IT), are 


critically important to attaining IT 
implementation success, the dynamics which 
exist between these behaviors are not, as yet, 
fully understood. Further, these behaviors have 
not been deeply grounded within a theoretical 
foundation, nor have commonly-accepted 
definitions been developed. This paper through 
an extensive review of the research literature 
dealing with post-adoption IT implementation 
behavior, institutionalization and organizational 
learning integrates what is currently known 
about post-adoption behaviors to provide 
definitions of the constructs, and a set of causal 
models which theoretically link the constructs to 
one another as well as to other variables 
understood to significantly influence IT 
implementation success. 
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ABSTRACT: Small manufacturing 

enterprises in Hong Kong are getting 
increasingly globalized and therefore need to use 
decision support tools/technologies to remain 
competitive. As they lack the expertise to deploy 
these tools/technologies, they either avoid their 
use or fail in their successful use or 
institutionalization. A number of approaches 
have been suggested for institutionalization but 
many of them provide only the critical success 
factors and not the dynamics of the 
institutionalization process. The authors suggest 
a process-oriented strategic framework for 
institutionalization of these tools/technologies, 
which identifies critical factors for successful 
institutionalization. Finally, they describe a case 
study where the framework was applied and was 
successful. 
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Technology Transfer. Non-Proprietary,” 
Technical Report STP-309-87, MCC Report, 
October, 1987. 

Category: technology transfer 

Key Words:: transfer bibliography, 
transfer strategies 

Abstract/Summary: This report 

synthesizes the current state of the art and 
practice in software technology’ transfer, 
drawing heavily on existing empirical studies. It 
contains an extensive reference list which was 
the source for many of the references included in 
this bibliography. 

Referenced by (Przbylinski 1988) 
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MIT, October 14, 1983 ONR TR 26, Sloan 
School of Management, November, 1983. 

Category: organizational change 

Key Words: change agents 

Abstract/Summary: This speech 

discusses the process of organizational change. 
It includes a lengthy reference list. 
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(Schneider 2000) Schneider, Thomas, 
“Information Theory Primer” 

www.LECBNCIFCRF.gov~toms/paper/primer , 

Category: Information Theory 

Introduction 

Key words: Uncertainty, Shannon, 
Rate, Bit, Noise 

Abstract/Summary: This primer is 
written for molecular biologists who are 
unfamiliar with information theory’. Its purpose 
is to introduce you to these ideas so that you can 
understand how to apply them to binding sites 
(1, 2, 3, 4, 5, 6, 7, 8, 9). Most of the material in 
this primer can also be found in introductory 
texts on information theory. Although Shannon’s 
original paper on the theoiy of information (10) 
is sometimes difficult to read, at other points it is 
straight forward. Skip the hard parts, and you 
will find it enjoyable. Pierce later published a 
popular book (11) which is a great introduction 
to information theory. Other introductions are 


listed in reference (1). A workbook that you may 
find useful is reference (12). Shannon ’s complete 
collected works have been published (13). 
Information about ordering this book is given in 
http://www.lecb.ncifcrf.gov/~toms/bionet.info- 
theory.faq.html#REFERENCES- 
Information Theory . 
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Key Words: product champion 

Abstract/Summary: This papers 

summarizes a study conducted by Arthur D. 
Little Inc., under a contract administered by the 
National Inventors Council supported by the 
military services. It provides information on why 
inventors fail and suggests patterns for success, 
based on the concept of product champions. 
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(Schon 1967) Schon, Donald A. 
Technology and Change: The New Heraclitus., 
Delacorte Press, New York NY, 1967. 

Category: organizational change 

Key Words: resistance to change, 
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Abstract/Summary: Schon discusses 
an organizations natural ambivalence to change. 
Firms must both resist and espouse innovation. 
The first chapter includes some general 
definitions of technology and innovation that 
embrace those of Rogers and others but are 
much more understandable. 
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Harvard Business Review :68 74, March/April, 
1987. 
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Abstract/Summary: This recent HBR 
article discusses how to kill development 
projects, i.e., when rationality should rule over 
emotional attachment. 
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Konover Hotel, Miami Beach FLLL, April 25- 
27, 1983. 

Category: technology transfer 

Key Words: tool transfer 

Abstract/Summary: Taylor discusses 
the use of toolsmiths as a technology transfer 
mechanism in a Unix environment. While this 
method can be highly successful, he points out 
the disadvantages for management. 
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et al. “The Process of Technological Innovation: 
Reviewing the Literature,” Technical Report, 
National Science Foundation,May, 1983. 

Category: innovation 

Key Words: innovation diffusion, 
bibliography 

Abstract/Summary: This extensive 

NSF study is must reading for those interested in 
the management of innovation. It summarizes 
much existing work, while also comparing 
research across disciplines. I t includes a forty 
page reference list. 

Referenced by (Przbylinski 1988) 

(Tushman 1979) Tushman, Michael L.” 
Managing Communication Network in R&D 
Laboratories,” Sloan Management Review 20:37- 
49, Winter, 1979. 

Category: communication 

Key Words: communication networks, 
boundary spanners 

Abstract/Summary: This paper 

continues the work started by Allen Tushman 
discusses his contingency model managing 
communication in R&D , which includes the 
concept of boundary spanners. 
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Key Words: management of innovation 

Abstract/Summary: This book 

contains reprints of many pertinent articles, 
mostly from the Sloan Management Review’. A 
number of references in this bibliography were 
reprinted here. 
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(Twiss 1980) Twiss, Brian C. 
Managing Technological Innovation. Longman, 
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Category: innovation 
Key Words: technology management 
Abstract/Summary: This book takes a 
pragmatic approach to technology management, 
including chapters on financial evaluation of 
R&D projects, organization for innovation and 
technology forecasting. 
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“The Need for Some Innovative Concepts of 
Innovation: An Examination of Research on the 
Diffusion of Innovations.,” Policy Sciences 5, 
pp. 33-451, 1974. 

Category: innovation 
Key Words: diffusion research 
Abstract/Summary: Warner discusses 
the definitional problems which are the basis for 
inconsistencies in diffusion research performed 
by different disciplines. H e feels that much basic 
conceptualization and theorizing must be 
performed before 
research can move 
forward. 
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Management Science in Federal Agencies. The 
Adoption and Diffusion of a SocioTechnical 
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Category: innovation 
Key Words: management of 

innovation, organizational change 

Abstract/Summary: This book 

documents White's study of the insertion of 
management science technology into federal 
agencies. Management science is similar to 
software technology in that it also has both tool 
and process components. White includes a 
detailed model of the organizational 
responses/changes caused by new’ technology. 
Referenced by (Przbylinski 1988) 

(Wright 1969) Wright, Philip. 
“Government Efforts to Facilitate Technical 
Transfer,” in William H. Gruber and Donald G. 
Marquis (editors), Factors in the Transfer of 
Technology, chapter 14, pages 238-251. The 
M.I.T. Press, Cambridge, MA, 1969. 

Category: technology transfer 
Key Words: transfer evaluation 
Abstract/Summary: This paper 

documents the study of NASA’s technology 
transfer efforts. While NASA is often touted as 
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an example of effective technology transfer, this 
study can provide no hard evidence. 

Referenced by (Przbylinski 1988) 

(Wylie 1982) Wylie, C. Dennis and 
Robert R. Mackie. “Factors Influencing 
Organizational Acceptance of Technological 
Change” in Training.Final Technical CRG-TR- 
82-018, Canyon Research Group, Inc., October, 
1982. 

Category: organizational change 

Key Words: innovation adoption, 
adoption models 

Abstract/Summary: This report, 

funded by the Organizational Effectiveness 
Research Group at the Office of Naval Research, 
contains a good literature review on the factors 
in innovation acceptance. It also includes a 
predictive model of organizational acceptance 
based mostly on communication and 
motivational factors . 
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